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1. Introduction 

Bertrand competiton has been a prominent paradigm for the empirical study of differen- 
tiated product markets for at least twenty years. Firms engaged in Bertrand competition 
maximize profits by choosing prices for portfolios of differentiated products, and Bertrand- 
Nash equilibrium prices simultaneously maximize profits for all firms. Models combining 
Bertrand competition with the Mixed Logit discrete choice model of consumer demand 
have been used to study the automotive industry, electronics, entertainment, and food 
products and services; see Dube et al. (2002). 

Many applications of Bertrand competition rely on counter factual experiments: exer- 
cises in which hypothetical market conditions are simulated with an estimated model. Such 
experiments have been used to study corporate mergers (Nevo, 2000a), novel products and 
services (Petrin, 2002; Goolsbee and Petrin, 2004; Beresteanu and Li, 2008), store loca- 
tions (Thomadsen, 2005), and regulatory policy changes (Goldberg, 1995, 1998; Beresteanu 
and Li, 2008). By definition, simulating market outcomes in counterfactual experiments 
requires computing equilibrium prices after changing the values of exogenous variables 
such as the number of firms or the products offered. Numerical methods for computing 
equilibrium prices have not yet received a thorough treatment in the literature, which cur- 
rently focuses on model specification and estimation; see Knittel and Metaxoglou (2008); 
Dube et al. (2008); Su and Judd (2008) for recent developments in estimation. Morrow 
and Skerlos (2010) fills this gap with a detailed investigation of four approaches for com- 
puting Bertrand-Nash equilibrium prices in single-period, multi-firm models with Mixed 
Logit demand. This working paper provides most of the technical background for that 
investigation. 

Applying Newton's method to some form of the first-order or "simultaneous stationarity" 
condition is currently the de facto approach for computing equilibrium prices; see, for 
example, Nevo (1997, 2000a); Petrin (2002); Smith (2004); Doraszelski and Draganska 
(2006); Jacobsen (2006). Newton's method applied directly to the first-order condition 
may converge when started at observed prices if changes in exogenous variables have a 
marginal impact on equilibrium prices. However, when the changes to exogenous variables 
imply significant changes in product prices Newton's method applied directly to the first- 
order conditions may fail to compute equilibrium prices. Furthermore analyses that do not 
have observed prices to use as an initial guess will require methods with greater reliability. 

Morrow and Skerlos (2010) demonstrate that solving fixed-point equations equivalent to 
the first-order condition for equilibrium is more reliable and efficient than solving the first- 
order condition itself. One fixed-point equation equivalent to the first-order conditions 
is the BLP-markup equation popularized by Berry et al. (1995). A second fixed-point 
equation, here termed the (^-markup equation, is a novel way to write the same condition on 
markups. Both markup equations lead to more robust numerical methods than found with 
a simple application of Newton's method to the first-order condition. Using the fixed-point 
expressions in this way can be considered "nonlinearly" or "analytically" pre-conditioning 
the first-order condition satisfied by equilibrium prices, a technique well-known in applied 
mathematics (Brown and Saad, 1990; Cai and Keyes, 2002). 
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The existence of fixed-point equations for equilibrium suggests applying fixed-point it- 
eration (Judd, 1998) to compute equilibrium prices, instead of Newton's method. The 
BLP-markup equation does not appear to be well-suited to fixed-point iteration. Example 
7 in Morrow and Skerlos (2010) provides a case in which iterating on the BLP-markup 
equation is not necessarily locally convergent, while iterating on the ^-markup equation is 
superlinearly locally convergent. Iterating on the (^-markup equation also eliminates the 
need to solve linear systems, required to implement Newton's method and to iterate on 
the BLP-markup equation. This property makes fixed-point steps based on the (^-markup 
equation very inexpensive relative to Newton steps, an essential property to obtaining fast 
computations from generally linearly convergent fixed-point iterations. 

Besides Newton's method and fixed-point iteration, few other practical approaches to 
the computation of equilibrium prices exist. Variational formulations, widely applied in 
economic and engineering problems (Ferris and Pang, 1997), contain many solutions that 
need not be equilibria of the original problem. Explicit least-square minimization or Gauss- 
Newton methods can also be implemented, but are computational disadvantages relative to 
applications of standard Newton- type methods for nonlinear systems. Some authors apply 
tattonement — iterating on a game's best response correspondence — to compute equilib- 
rium in prices or other strategic variables including product mix (Choi et al., 1990), product 
characteristics (CBO, 2003; Austin and Dinan, 2005; Bento et al., 2005), and engineering 
variables (Michalek et al., 2004). Tattonement, however, has three issues: it requires the 
iterative computation of profit-optimal prices (a special case of the problem discussed in 
this article), should be inefficient relative to direct methods whenever optimal strategies 
are coupled, and lacks the global convergence guarantees of contemporary Newton solvers. 
Section 5 reviews these conclusions in more detail. 

This article should be viewed as a companion to Morrow and Skerlos (2010); some of 
our notation and text may seem out of place without first reviewing that article. In several 
places, text from Morrow and Skerlos (2010) is repeated. 

2. A Technical Framework 

This section describes the mathematical framework employed in Morrow and Skerlos 
(2010). Several key assumptions are introduced and summarized in Table 1. 

2.1. Mathematical Notation. 

2.1.1. Sets. Table 2 lists some important sets and the symbols used for them. N denotes 
the natural numbers {1, 2, . . . }, and N(A^) denotes the natural numbers up to N, that is, 
N(A^) = {1, . . . , N}. M denotes the set of real numbers (— oo, oo), [0, oo) denotes the non- 
negative real numbers, and [0, oo] denotes the extended non-negative half-line. We denote 
the (J— l)-dimensional simplex {(xi, . . . ,xn) € [0, 1]^ : ^n=i = 1} by S{N), and the J- 
dimensional "pyramid" {(xi, . . . ,xj\i) G [0, 1]^ : Yln=i — -'-} by '^{J)- Hyper-rectangles 
in M^, i.e. sets of the form [ai,6i] x ••• x [aj\[,b]\f] for some an,bn G IR with a„ < 6„ 
for all n € N{N), are denoted by [a, b] where a = (oi, . . . , ajy) and b = (6i, . . . , b]\f). V 
always denotes the non-negative numbers: V = [0,cx3). For other sets, we typically use 
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Table 1. List of important assumptions used in this section. 

Assumption Purpose 

2.1 To provide a general form for utility functions 

2.2 To ensure profits are bounded and vanish as prices increase without bound 

2.3 To ensure the Leibniz Rule holds, validating Eqn. (9) 

2.4 To ensure that r] is bounded. Implies the coercivity of F^,F(^ 
and the existence of simultaneously stationary prices. 

2.5 To ensure that C is bounded. Implies the coercivity of 
and the existence of simultaneously stationary prices. 

3.1 To ensure that the derivatives of profit vanish as prices increase without bound 

3.2 To ensure the coercivity of F^,F^ under weaker conditions than 
Assumption 2.4. 



Table 2. Important sets. 



Symbol Description 



N 




{1,2,...} 


Natural numbers 


M 




(— oo, oo) 


Real numbers 


V 




[0,oo) 


Non-negative real numbers 


J 




{!,..., J} 


Set of product indices 


X 


c 




Set of product characteristics 


T 


c 




Set of individual characteristics 



calligraphic upper case letters such as ".A". For any set A, \ A\ denotes its cardinality. For 
any B C A, A\B denotes the set {b G A : b ^ B}. 

2.1.2. Symbols. Table 3 itemizes specific symbols used in the text. 

Bold, un- italicized symbols (e.g., "x") denote vectors and matrices; typically we reserve 
lower case letters to refer to vectors and use upper case letters to refer to matrices; the 
vector of choice probabilities "P" is an exception. Throughout we use 1 to denote a vector 
of ones of the appropriate size for the context in which it appears. I always denotes the 
identity matrix of a size appropriate for the context. For any x S , diag(x) denotes the 
N X N diagonal matrix whose diagonal is x. Any vector inequalities between vectors are 
to be taken componentwise: for example, x < y means x„ < yn for all n. 

Random variables are denoted with capital letters "X", with random vectors being 
denoted with bold capital letters (e.g., "Q"). While this overlaps with our notation for 
matrices, it should not cause any confusion. P denotes a probability and E denotes an 
expectation, ess sup^ / denotes the essential supremum of the (measurable) function / 
over T, with respect to the measure fi; see, e.g., Bartle (1966). 

log always denotes the natural (base e) logarithm. We use the "Big-0" notation 0{g) as 
follows: If there exists some M < oo such that liuip^glf (p) / g{p)] < M, we say / € 0{g); 
the point q is left implicit. 
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Table 3. Summary of important symbols. 
Symbol Description Defined in 



Products (see Section 2.2) 



J 


G 


N 


number of products 


K 


G 


N 


number of non-price product characteristics 




G 


X 


non-price characteristics of product j 


Pi 


G 


V 


price of product j 


P 


G 


-pj 


vector of all product prices 



Individual Characteristics (see Section 2.2) 

6 & T individual characteristics, including observed 

demographics and "random coefficients" 
/i — — distribution of individual characteristics 



Choice Probabilities (see Section 2.2) 



Uj{e,pj) 


G 


[— oo, oo) 


utility of product j 


m 


G 


[— oo, oo) 


utility of the outside good 




G 


[0,1] 


Logit choice probability for product j 


Pjip) 


G 


[0,1] 


Mixed Logit choice probability for product j 


P(p) 


G 


[0,1]' 


vector of Mixed Logit choice probabilities for all 



products 



Firms, Costs, Profits, and Stationarity (see Section 2.4, 2.5) 



F 


G 


N 


number of firms 




C 


J 


indices of the products offered by firm / 




G 


V 


(fixed) unit cost of product j 


c 


G 


rj 


vector of all (fixed) unit costs 


7r/(p) 


G 


M 


expected profits for firm / 




G 


M 


derivative of firm /'s profits, with respect to the 
price of product k 


(W)(p) 


G 




Combined Gradient of profits 



Eqn. (2) 
Eqn. (6) 

Prop. 2.2, Eqn. (7) 



Choice Probability Derivatives (see Sections 2.5, 2. 



{DkPj){p) G M derivative of product j's choice probability 

with respect to the price of product k 
(Z)P)(p) G M"^^"^ "intra-firm" Jacobian matrix of the choice Eqn. (8) 

probability vector 

A(p), r(p) G M"^^"^ matrices appearing in our decomposition of (Z)P)(p) Eqn. (9), 



Fixed- Point Equations (see Sections 2.7, 2.8) 

J7(p) G the BLP-markup function (Berry et al., 1995) Eqn. (13) 

C(p) G M"^ our ^-markup function Eqn. (18) 
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2.1.3. Differentiation. Our conventions for denoting differentiation follow Munkres (1991). 
We use the symbol "-D" to denote differentiation using subscripts to invoke additional 
specificity. Letting f : — >• M-^, (Z)m/n)(x) denotes the derivative of the n^^ component 
function with respect to the m^^ variable and (Df)(x) is the N x M derivative matrix of f 
at X with components {{Di){x))n,m = (L'm/n)(x). Thus for / : R^^ ^ R, (£'/)(x) IS a row 
vector. If / : M*^ — t- M, we define the gradient (V/)(x) G as the transposed derivative: 
(V/)(x) = (D/)(x)^. 

2.2. Consumers, Products, and Choice Probabilities. A collection of F G N firms 
offer a total of J G N products to a population of individuals (or households). Each product 
j £ J' = {1, . . . , J} is defined by a price, pj G V = [0, oo), and a vector of G N product 
"characteristics" Xj G C M^. Individuals are identified by a vector of characteristics 
from some set T- These individual characteristics can include both observed demographics 
and "random coefficients" (Berry et al., 1995; Nevo, 2000b; Train, 2003) that characterize 
unobserved individual-specific heterogeneity with respect to preference for product char- 
acteristics. The relative density of individual characteristic vectors in the population is 
described by a probability distribution fi over T. 

An individual identified hy 9 £ T receives the (random) utility 

from purchasing product j G J, and 

Uo{e) = ^{0) + So 

for forgoing purchase of any of these products; i.e. "purchasing the outside good." Indi- 
viduals choose the "product" j G {0, . . . , J} with maximum utility. Here u : T x X xV ^ 
[— oo, oo) is a systematic utility function, : T — )■ M is a valuation of the no-purchase option 
or "outside good," and £ = {£j}j^Q is a random vector of i.i.d. standard extreme value 
variables. Section 2.3 below gives a general specification of utility functions appropriate 
for equilibrium pricing. The basic requirements are that u is continuously differentiable 
and strictly decreasing in price, and without lower bound as prices increase. 

Demand for each product j is characterized by choice probabilities Pj : — t- [0, 1] 
derived from (random) utility maximization. Given the distributional assumption on £, 
the choice probabilities for an individual characterized by G T are those of the Logit 
model (Train, 2003, Chapter 3): 

(1) Ph^^V 



The vector p G V"' denotes the vector of all product prices. Product-specific utility func- 
tions Uj : TxP — > [—00,00) for all J, defined by Uj{6,p) = u{6,Xj,p) for all {0,p) gTxV, 
are used in Eqn. (1) and in the following sections. The Mixed Logit choice probabilities 
Pj(p) = f Pj"{0,p)dfi(0) follow from integrating over the distribution of individual char- 
acteristics (Train, 2003, Chapter 6). The vector of Mixed Logit choice probabilities for all 
products is denoted by P(p) G [0, 1]"'. 
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The examples below review several instances of this choice model. Examples 1 and 2 are 
used in Morrow and Skerlos (2010). Example 3 illustrates the type general specifications 
used in estimation. Example 4 describes one kind of "simulation" of a Mixed Logit model 
(Train, 2003). 

Example 1. (Boyd and Mellman, 1980) Take T = Vx , denoting G = (a, (3) for a £ V 
and (3 G R^. Set u{a,l3,x,p) = -ap + p'^x and ??(a,/3) = -oo for all (q,/3) G P x M^. 
fi is defined by specifying that a and /3 are independently lognormally distributed (with 
appropriately chosen signs, means, and variances). 

Example 2. (Berry et al, 1995) Take T = P x M^' x M, denoting 6 = {cp, /3, (Sq) for (j)£V, 
(3 G M^^', and (3o G M. Set 

{alog{(j) — p) + (3~^x if p < (j) 
and i9((/),/3o) = alog(/) + /3o 
— oo otherwise 

for some fixed coefficient a > 0. (p represents income and is given a lognormal distribution, 
while the random coefficients P, Po are independently normally distributed with some mean 
and variance. Note that income ((j)) serves as an upper bound on the price an individual 
can pay for any product. 

Example 3. (Nevo, 2000b) Take T = V xR^ x M^'+^ denoting 6 = {4>, d, u) for (peV, 
d G M^, and u G M^'"^^. Again, (j) represents income; d G represents a vector of D 
observed demographic variables (which may include income); v G R-^"^^ represents a vector 
of K + 2 random coefficients: one for each product characteristic, one for price, and one 
for the outside good. Set 



u{(j),d,v,x,p) = {a + 7vJd + aJv){(j)-p) + {l3 + Ud + 'Sv)'^ X 
^{(f), d, v) = {a + Trjd + <tJi/)(/) + vr^d + ajv 



where a G M, /3 G M^, tt^.tto G M^, H G M^'^^^, (Tp.aQ G M^+^ and S G M^x(^+2) are 
coefficients. The distribution of d is estimated from available data (e.g.. Census data) and 
V is assumed to be standard independent multivariate normal. When a + vrjd + crju, the 
coefficient on price, is positive, an individual prefers higher prices. 

Petrin (2002) and Berry et al. (2004) adopt similar specifications that eliminate this 
counterintuitive property. Petrin (2002) takes the price component of utility to be a{(p) \og((p- 
p), where a : "P — )• "P is a step function. Berry et al. (2004) take the price component of 
utility to be ap, but define a = —e~^°'~^'"p^'^"'p'^\ 

Example 4. (Simulation). Take any of the examples above, and draw S £ N vectors 
0s G T according to the distribution /i. Let T' = {6s}g^i and define a probability measure 
fi' over T' by fi'{Os) = l/S for all s. Then («, t?, T', /x') defines a simulator of the "full" 
Mixed Logit model with {u,'d,T, IJ.); see Train (2003). These approximations are essential 
in estimation of Mixed Logit models and in computations of equilibrium prices. 
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2.3. Utility Specification. This section presents a generalization of the systematic utility 
functions used in the examples given in the text, a specification closely related to the one 
introduced by Caplin and Nalebuff (1991). Morrow (2008); Morrow and Skerlos (2008) use 
a similar specification to analyze equilibrium prices in simple Logit models. 

Assumption 2.1. For all j, there exist functions wj : T x V ^ [—00,00) and vj : 
T —7- (—00,00) such that the systematic utility function uj : T x "P — )• [—00,00) can be 
written Uj{6,p) = Wj{6,p) + Vj{6). Furthermore there exists ? : T — )■ (0, 00] such that 
Wj -.T X [0,00] — [—00,00) satisfies, for all j and ^-almost every (a.e.) 6 £T, 

(a) Wj{6,-) : (O,<;(0)) — )■ [—00,00) is continuously differentiable, strictly decreasing, 
and finite 

(b) Wj{6,p) = —00 for all p > ?(0), and 

(c) Wj{0,p) I —00 as p t ^(^)- 
Vj : T — 7- (—00, 00) is arbitrary. 

Note that we have not restricted fi, the distribution of individual characteristics, with 
Assumption 2.1. Important examples of fj, from the econometrics and marketing litera- 
ture include finitely supported distributions (often empirical frequency distributions for 
integral observed demographic variables), standard continuous distributions (e.g. normal, 
lognormal and x^), truncated standard continuous distributions, finite mixtures of standard 
continuous distributions, and independent products of any of these types of distributions. 
This generality allows us to address a wide variety of otherwise disparate examples with 
a single notation. In particular, this generality allows us to use a single framework to 
treat both "full" Mixed Logit models defined by some fi with uncountable support and 
simulation-based approximations to such models. 

Some existing empirical specifications violate Assumption 2.1 by admiting positive price 
coefficients for 9 £ T' C T, where T' C T has nonzero /x-measure. See, for example, Nevo 
(2000a) (Example 3) or Brownstone et al. (2000). This implies that w{6,-) is increasing 
on T'. If w{6, •) is not decreasing for /i-a.e. 0, or at least eventually decreasing for //-a.e. 
9 in the sense that there are always prices large enough to ensure that w{9,-) is decreasing 
for /i-a.e. 6, then profit-optimal pricing is not a well-posed problem and finite equilibrium 
prices will not exist. 

The variable S = (^(0) represents an individual-specific reservation price. As in the 
Berry et al. (1995) model of Example 2, this reservation price is most often derived from 
household or individual income. Correspondingly, S is often given a lognormal distribution 
to (roughly) fit empirical income data. In principle, this reservation price could be related 
to purchasing power derived from observed demographic variables other than income, or 
unobserved demographic variables such as family wealth. Thus we allow this reservation 
price to be specified as a function of all "demographic" characteristics, 9. Conditions (b) 
and (c) in Assumption 2.1 imply that the probability an individual characterized by 6 will 
purchase a product is zero for any price above <;(0) and vanishes as the price approaches 
?(0). We set <;^, = ess sup? and allow, but do not require, = 00. For example, simulation- 
based approximations to the Berry et al. demand model have < 00, as can be easily 
checked. 



FIXED-POINT APPROACHES TO COMPUTING BERTRAND-NASH EQUILIBRIUM PRICES 9 

Note also that Condition (c) in Assumption 2.1 ensures the continuity of Pj"{6,p) at 
any vector of prices with some component equal to We must require this of the 

Logit choice probabilities to obtain Mixed Logit choice probabilities that are continuous 
on (0, ?^,)'^ for the important class of simulation-based approximations with finitely sup- 
ported 11. Continuous Logit choice probabilities also imply continuous Mixed Logit choice 
probabilities, by the Dominated Convergence Theorem. 

2.4. Profits. To describe the optimal pricing problems faced by each firm we use the 
following notation. Let F G N denote the number of firms. For each / G {1, . . . , F}, there 
exists a set Jf <Z J oi indices that corresponds to the J/ = \ Jf \ products offered by firm /. 
The collection of all these sets, {cT/j^Lj^, forms a partition of J . Subsequently, in writing 
^^fUy^ for some j & J', we mean the unique / G {1, . . . , F} such that j ^ Jf. The vector 
p/ G M'^-f refers to the vector of prices of the products offered by firm /. Negative subscripts 
denote competitor's variables as in, for instance, p_j G M"^"-'', where J_/ = Ylg^f'^g^ 
the vector of prices for products offered by all of firm /'s competitors. Firm-specific choice 
probability functions are denoted by P/(p) G M'^-f. 

Two additional assumptions are required to complete the definition of firms' profits in 
a manner consistent with empirical applications of Bertrand competition. First, we must 
specify unit and fixed costs: for each product j there exists a unit cost Cj G V and for each 
firm there exists a fixed cost G V. Both Cj and cJ depend only on the collection of 
product characteristics chosen by the firm, and not on the quantity sold by the firm during 
the purchasing period for the reasons discussed below. We let cj G V^^ denote the vector 
of unit costs for the products offered by firm /, and c G denote the vector of unit costs 
for all products. 

Second, Bertrand competition entails the following "comittment" assumption on the 
quantities produced (Baye and Kovenock, 2008). Let Qj{Y>) denote the (random) quantity 
of product j that the population will demand during the purchasing period, given prices 
for all products p. These random demands are derived from random utility maximization. 
We assume each firm commits to producing exactly <5j(p) units of each product j G Jf 
during the purchasing period. This implies either that there are no production capacity 
constraints that limit a firm's ability to meet any demands that arise during the purchase 
period, or that production backlogs do not affect demand. 

With the commitment and constant costs assumptions, the total cost firm / incurs in 
producing (and selling) (5j(p) units of product j during the purchasing period are given 
by the random variable 

CjQj(p) + cj. 

Random revenues are, of course, given by Qj(.P)Pj- The random variable 



n/(p) = Qjip)iPj - Cj) - cJ 
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then gives firm /'s (random) profits for the purchasing period as a fmiction of all product 
prices. Following most of the theoretical and empirical literature in both marketing and 
economics, we assume that firms take expected profits, 

(2) E[n/(p)] = Mfip) - where 7r/(p) = J] Pj{p){pj - c,) 

as the metric by which they optimize their pricing decisions in this stochastic optimization 
problem. Here / G N denotes the number of individuals in the population. 

Eqn. (2) demonstrates that neither the total firm fixed costs cj nor the population 
size / play a role in determining the prices that maximize expected profits under the 
assumptions described above. Henceforth we focus on the "population-normalized gross 
expected profits" vr j (p) , referred to in the text and below as simply "profits" . Firms thus 
solve 

maximize vr/(p) = Pj{p){pj — cj) 

(3) ie^/ 
with respect to P/ G V'^^ 

Before continuing with our framework, we discuss quantity-dependent costs and clarify 
when profits are bounded. 

2.4.1. Quantity-Dependent Costs. Including costs that depend on quantities produced is 
certainly possible, though this should introduce extra terms into the first-order equations 
presented below (Eqn. (7)). Generally speaking, unit costs that depend on the quantity 
produced would be expressed as cj : Z+ — t- V, and unit costs that depended on the expected 
quantity produced would be expressed as Cj : V ^ V. If unit costs depend on the quantity 
produced, then product j's unit costs for the purchasing period (i) are random and (ii) 
depend on prices. To see this, simply note that product j's unit costs for the purchasing 
period are Cj{Qj{p)). Assuming quantity-dependent costs also obscures expected profits, 
since there are now nonlinear terms Qj{p)cj{Qj{p)) in the formula for random profits. 
If unit costs depend only on the expected quantity produced, then unit costs are not 
random but still depend on prices: Cj{E[Qj{p)]) = Cj{IPj{p)). In either case the derivatives 
of unit costs with respect to prices should appear in the first-order conditions. This is 
acknowledged in the theoretical literature. As these terms have not yet been included in 
the empirical literature, even when costs are assumed to depend on quantities produced 
(Berry et al., 1995; Petrin, 2002), we focus on costs that are independent of the quantity 
produced. 

2.4.2. Bounded and Vanishing Profits. Here we present a technical assumption that ensures 
that profits are not only bounded, but vanish as all prices approach c^^,. 

Assumption 2.2. For all j there exists some rj : T ^ (1)C«) and some pj : T ^ V 
satisfying 

(4) sup { pfi{{0 : p{e) >p}) : pe (0, ?*) } < oo. 
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such that 

(5) Uj{^,P) < -rj{0) logp + ^{6) 

for all p > Pj{0), fi-a.e. 

Lemma 2.1. If Assumption 2.2 holds, then vr/(p) is bounded in p and vanishes as pf — >■ 
?*1 G M-^/. 

Proof. We use the Dominated Convergence Theorem. Eqn. (5) ensures that pjPj'{0,p) 
vanishes fi-a.e. as pj t see also Morrow and Skerlos (2008). Eqn. (4) ensures that 

is bounded as prices approach <;^, as we now show. 
The key quantities in this integral are 



I 



the CjPj{p) terms vanish if pj '[ <j^, since Pj{p) vanishes. We must show that these terms 
are bounded as pj t '^*- By assumption, 



Pj 



< 



for all Pj > Pj{0). Thus we write 



e 



i{e:pj>pj{e)} 
< pjfi{e : Pj < pj{6)} 



By Eqn. (4), the first term is bounded. We take pj > 1, without loss of generality, so that 

l/prjiO)~l < 1 for 

fi-a.e. 6 and the second term is bounded. □ 
We now make some remarks regarding Assumption 2.2. 

Note that if <;{0) < oo then Eqn. (5) holds for any r{9) > 1 by taking p{0) = q{9). If 
= oo, Eqn. (5) admits any utility function u{6, •) that is (eventually) concave in price. 

If < oo, then q{6) < oo for fi-a.e. 9. Furthermore, Eqn. (4) is trivial. 

To further analyze Eqn. (4), we assume q = oo. We define Z = p{&), where is the 
T-valued random variable with P(0 £ A) = IJ^{A) = jj^d^{6). If (;{0) < oo for //-a.e. 
6, then we can take Z = T, = ?(©). Eqn. (4) can be re-written as sup{pP(Z > p) : 
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p E (0,00)} < 00, or equivalently limp^ao[p^{Z > p)] < 00. Eqn. (4) admits any Z 
with finite expectation, and even admits any Z with a "fat-tailed" distribution satisfying 
p^+Pf>(^Z > p) — )• 1 as p — )• cxD for some /3 > 0. Eqn. (4) can be written F{Z > p) = 0{l/p). 

2.5. Local Equilibrium and the Simultaneous Stationarity Conditions. Assuming 
that the choice probabilities are continuously differentiable in prices, at equilibrium each 
firm's prices satisfy the stationarity condition 

(6) {DkTT}){v) = Yl iDkPj)ip)iPj - Cj) + Pfc(p) for ah k G Jj. 

Combining the stationarity condition for each firm we obtain the Simultaneous Station- 
arity Condition, a first-order (necessary) condition for local equilibrium prices. 

Proposition 2.2 (Simultaneous Stationarity Condition). Suppose P is continuously dijfer- 
entiable. Let (V7r)(p) E M'^ denote the "combined gradient" with components ((V7r)(p))j = 
(Z)j7rjQ))(p) where f{j) denotes the index of the firm offering product j. If p is a local 
equilibrium, then 

(7) (Vvr)(p) = (OT)(p)^(p - c) + P(p) = 0. 

where (l)P)(p) G R-^^-^ is the "intra-firm" Jacobian matrix of price derivatives of the 
choice probabilities defined by 

(DfePj)(p) if products j and k are offered by the same firm 

otherwise 

Prices p satisfying Eqn. (7) are called "simultaneously stationary." 

The matrix — (L'P)(p) has previously been denoted by "A" (Berry et al., 1995; Petrin, 
2002; Beresteanu and Li, 2008), "fi" (Nevo, 2000a), and (Dube et al., 2002). We prefer 
the "D" notation to emphasize the relationship of (-DP)(p) to the Jacobian matrix of the 
choice probabilities P, while using the superscript "~" to denote the intra-firm sparsity 
structure. 

A set of simultaneously stationary prices are a local equilibrium only if every firm's profits 
are locally maximized at those prices; this can be verified by confirming that every firm's 
profits are locally concave (Section ??). Note that there is no convenient condition to verify 
that every firm's profits are globally maximized at a particular local equilibrium. That is, 
there is no convenient condition to ensure that certain prices are a proper equilibrium. 

2.6. Choice ProbabiUty Derivatives. In this section we examine the price derivatives 
of Mixed Logit choice probabilities. In what follows, {Dwj){pj) denotes the derivative of 
the price component of utility, Wj, with respect to price. 

Proposition 2.3. Fix p G (0, ?*)'^, let uj be given as in Assumption 2.1 for all j, and sup- 
pose the Leibniz Rule holds for the Mixed Logit choice probabilities Pj{p) = f Pj'i^i p)dfJ'{0); 
that is, (L'fcPj)(p) = f{DjP^){6,p)dfi{6). Then the Jacobian matrix o/P is given by 
(9) (Z)P)(p) = A(p)-r(p) 



(8) ((I?P)(P)),,, = { 
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where A(p) G M'^^"' is the diagonal matrix with diagonal entries 

A,(p) = / iDwj)i9,p,)Pl^{e,p)dfi{e), C{p) = {6 : ^(0) > p} 

and r(p) is the full J x J matrix with entries 

7i,fc(p)=/ p,He,p)Pt{e,p){Dwk){e,pk)dfi{e), g{p,q) = c{p)nc{q). 

The intra- firm price derivatives of the Mixed Logit choice probabilities are given by (Z)P)(p) = 
Mp) - r(p) where (f (p))^.^ = 7i,fc(p) if fU) = f{k) and (f (p))^.^ = otherwise. 

Proof. We first characterize the Logit choice probabilities. For all j, k we have 

(L>feP/')(0,p) = p/(0,p)(<^,-fc - Pt{e,p)){Dwk){e,p,,) 

= 6j,kPt{o,p){Dwk){e,pk) - pf{e,p)Pt{e,p){Dwk){e,pk) 

for any 6 G C{pk) and (L>fcP/')(0, p) = for any 6 £ {9' £ T : Pk > (because 
Pj'{6,-) is identically zero in a neighborhood of p). Neglecting values G for 
the moment, we observe that these formulae and the Leibniz rule generate the desired 
expression for the Mixed Logit choice probabilities. 

We complete the proof by considering 6 G ?~^(pfc). If ^"^(pfc) has /x-measure zero for any 
Pk, then we do not need to worry about Logit choice probability derivatives at G <^~^{pk)- 
On the other hand if (pfc) has positive /i-measure for some pk, we must assume continuity 
of the Logit choice probability derivatives: i.e. {DkPj'){0,p) — t- as pfc t ?(^)- Otherwise, 
the Logit choice probability derivative is not defined on a set of demographics with positive 
measure. □ 

A is closely related to a familiar economic quantity. Recall that the "inclusive value," 
or expected maximum utility, conditional on demographics is given by (Small and Rosen, 
1981; Train, 2003) 

.^(0,p)=log ^e'^W+^e^^^^'P^) 

It is easy to see that Afc is the derivative of the "aggregate inclusive value" t(p) = 
/ L^{e,p)dfi{e) with respect to the k^^ price: Afc(p) = (P>fei)(p) = j {Dki^){9 ,p)d^{e). 

Note that r(p) and r(p) are not necessarily symmetric for all p. If {Dwk){0,p) is 
independent of both k and p, as in the case of the Boyd and Mellman (1980) model 
presented in Example 1 above, then r(p) (and thus r(p)) is symmetric for all p. On the 
other hand if {Dwk){0, •) is independent of k and strictly monotone in p, as is the case of 
the strictly concave in price utility from Berry et al. (1995), then 7j,fc(p) = 7fej(p) if and 
only if pj = Pk. 

The following assumption gives a simple, abstract condition on (u, i?, /i) that guarantees 
the Leibniz Rule holds and defines continuously differentiable choice probabilities. 
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Assumption 2.3. Let k be arbitrary and define ipk '■ T x V ^ V by 



ifp><i{e) 

Assume (i) ■ipig{0,-) : (0,q) V is continuous for fi-a.e. 6 € T; that is, tpki^^o) — ^ 
"0^(0, p) as q ^ p for any p G (0,?*). (ii) ipki'iP') '■ T ^ V is uniformly fi-integrable for 
all p' in some neighborhood of any p G (0,?*); that is, there exists some (p : T ^ [0;Oo) 
with f ip{6)dfj,{6) < oo (that may depend on k andp), such that ipi^{6,p') < (p{9) for all p' 
in some neighborhood of p. 

Note that under Assumption 2.1, (i) requires only that ipki^^p) — as p t ?(^) for 
H-a.e. 0. 

Proposition 2.4. If Assumption 2.3 holds, then the Leibniz Rule holds for the Mixed Logit 
choice probabilities which are also continuously differ entiable on (O,?^,)'^. 

Proof. Taking for granted that {DkPj"){0, •) is continuous at p and the differences 
(10) /i-i(P/(0,p + /iefc)-P/(0,p)) 

are uniformly /i-integrable for small enough h, the Dominated Convergence Theorem im- 
plies that 

iim/i-i( J Pl^{e,p + hek)dfi{e)- J pj'{e,p)d^i{e)) 

= limy /i-i(P/(0,p + /iefe) - P/(0,p))d//(0) 
= / \iTah~\PHe,v + hek)-PhO,v))dli{e) 

= j {DkPj^){e,p)dfi{0). 

This validates the Leibniz Rule. This proof is essentially that given in a general setting by 
(Bartle, 1966, Chapter 5, pg. 46). 

To complete the proof we must validate that {Di.Pj'){6, •) is continuous in pj. and the 
differences in Eqn. (10) are uniformly /x-integrable in a neighborhood of p^. It is easy to 
see that the desired continuity follows from Assumption 2.1 and Assumption 2.3, Condition 
(i). Specifically, note that {DkP[){e,p) = = Vfc(0,Pfc) for 6 £ {6' £ T : Pk > ?(0')} and 

(z)fcP/)(0,p) = (,5,-fc - pj^{e,p))pt{e,p){Dwk){e,pk) 

= {5,, - Pl{e,p)) ,,,, , Me.P.) 
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for G C{pk). Suppose pk = ?(^)- By Assumption 2.1 (a) and (b), the first two terms are 
continuous. By Assumption 2.1 (c), 

Assumption 2.3, Condition (i) is then necessary for the continuity of {DkPj'){0,p) for all 
j, k and p G (0, Specifically if tpki^y ■) is discontinuous at then 

/ em \ 

To prove the integrability, we first note that for all j, k and p we have {DkPj"){6, p) < 

tpk{6,p]t). This bound is a consequence of the formula above, and is tight as p_fc varies. 
The mean value theorem for functions of a single real variable states that 

/i-i(P/(0,p + /iefc) -P/(0,p)) = (Z)fcP/)(0,p + 7?efc) 

for some t] such that \r]\ < \h\, and thus 

\h\-^\Pj^{O,p + hek)-Pl'{0,p)\<MO,Pk + v)<^{O) 

for n-a.e. 6 G T and small enough h. Thus, the desired uniform ^-integrability follows 
from Assumption 2.3, Condition (ii). □ 



An "easier" bound is simply 



{DkPj){0,p) < \{Dwk){9,Pk)\-, and thus we might con- 



sider changing the statement of Proposition 2.4 to hypothesize only the uniform /i-integrability 
of the utility price derivatives. In fact, this bound can be used to validate the Leibniz Rule 
for the Boyd and Mellman model of Example 1 that lacks an outside good. However, this 
bound fails to be useful for the Berry et al. model of Example 2, since w{p) = a \og{q{0) —p) 
and \ {Dwk){0 ,pk)\ = a/{(;{6)—p) is singular on In empirical applications, ? is onto, 

generating a singularity somewhere in T for all p; this singularity cannot be "controlled" for 
all p by choosing the measure /i. In this case, a hypothesis only about the price derivatives 
of utility is not useful. 

We close this section by stating some basic results concerning (Z)P)(p) that are used 
below. 

Lemma 2.5. Under Assumption 2.1, Pj{p) and Xj{p) are never zero on (0,?,)-^. Thus 
A(p) is nonsingular for all p G (0, ^*)'^. 

Proof. Note that C{pj) is nonempty and has positive /i-measure, Pj'i', p) is strictly positive 
on C{pj), and {Dwj){-,pj)Pj'{-,p) is strictly negative on C{pj). It follows that Pj{p) and 
Aj(p) are nonzero. □ 



16 W. ROSS MORROW AND STEVEN J. SKERLOS 

Lemma 2.6. Let p G (0,?*)'', suppose i9 : T — )• (—00,00), and define 

(11) iifip) = Af{p)-^Tf{py G M^/x-^/ for all f 

(12) fi(p) = A(p)-if (p)^ G M-^^-^. 

These matrices are well-defined by Lemma 2.5, and have the following properties: 

(i) (2?/P/)(p)^ = A/(p)(I - n/(p)) and (OT)(p)T = A(p)(I - ^(p)). 

(ii) ||fi/(p)||oo < 1 and ||n(p)||oo < 1. 

(iii) I — r2j(p) G M"'/^''/ and I — ri(p) G M'^^"^ are strictly diagonally dominant and 
nonsingular. 

(iv) (I — Q,f{p))^^ G M'^^^"'/ and (I — i7(p))^^ G M'^^'^ map positive vectors to positive 
vectors. 

Proof. (i) This follows immediately from Prop. 2.3. 
(ii) We note that 



where is the probability distribution with density, with respect to given by 

, ^ PtiO,P)\iDwk)i0,Pk)\df,{e) 

Icip,)PH<t>,p)\iDw,)ict>,p,)\d^^{<t>y 

Thus Af{p)~^T f{p)~^ has row sums 

p/(0,p) L^fc,p(0) < 1. 

The additional assumption that : T — ?■ (—00,00) plays a role in establishing this 
inequality because then there is always a set 7^' C T with //fcp(7^') > on which 

(iii) The inequality 




is equivalent to 

\l-ujk,k{p)\ = 1 - / P^f^(0,p)d^fc,p(0) 




P/(0,p)|d/ifc,p(0) = ^^fc,Kp). 

l^k 



The claim follows. 
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(iii) Because fi/(p) maps positive vectors to positive vectors, so does its power series 

oo 

^nKpr = (i-"/(p)r'- 

n=l 

□ 

Corollary 2.7. {DfP f){p)~^ and {DP){p)~^ are strictly diagonally dominant and nonsin- 
gular for p G (0,?^,1). 

Proof. This follows directly from Lemma 2.6, claims (i) and (iii). □ 

2.7. The BLP-Markup Equation. A prominent form of the first-order conditions Eqn. 
(7) is the BLP-markup equation: 

(13) p = c + r]{p) where r]{p) = -(OT)(p)-^P(p). 

Note that rj is defined for any continuously differentiable choice probabilities with nonsin- 
gular {DP){p)~^ . We have shown above that this applies to certain Mixed Logit models 
(Section 2.6). Eqn. (13) and Corollary 2.7 show that r] is well-defined and continuous, at 
least for p G (0, ?*)'^. 

Traditionally, the BLP-markup equation (13) has been used to estimate costs assuming 
observed prices are in equilibrium via the formula c = p — r]{p); see, e.g.. Berry et al. 
(1995) or Nevo (2000a). These cost estimates form the basis of counterfactual experiments 
with an estimated demand model. Beresteanu and Li (2008) have recently suggested that 
the BLP-markup equation is also useful for computing equilibrium prices, a suggestion 
we explore further below. Note that the BLP-markup equation must be interpreted as a 
nonlinear fixed-point equation when applied to compute equilibrium prices. 

We now derive several important properties of rj from an alternative form of Eqn. (13) 
based on Lemma 2.6, valid when p G (O,*;"*)*^: 

(14) (l - fi(p))T7(p) = -A(p)-ip(p). 

First, Eqn. (14) proves that profit-optimal markups are positive for the class of Mixed 
Logit models we consider, thanks to Lemma 2.6, claim (iv). 

Corollary 2.8. Suppose Assumptions 2.1-2.3 hold. Then 'q{p) > for all p G (0,<;*)'^. 
Hence i/p G (0, ^*)'^ is a local equilibrium, then p > c. 

Second, Eqn. (14), rather than Eqn. (13), should be used to actually compute rj. Recall 
that K2(A) denotes the 2-norm condition number of the matrix A (Trefethen and Bau, 
1997). 

Lemma 2.9. Suppose Assumptions 2.1-2.3 hold. Eqn. (I4) is better conditioned than 
Eqn. (13), in the sense that 

for all p G (0, ?*1). 
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Proof. This follows from Lemma 2.6, claim (i), the inequality K2(AB) < K2{A.)k2(B) valid 
for any matrices A and B, and the formula 



maxj |Aj(p)| 

□ 

Lemma 2.9 states that the greater the variation in aggregate absolute rate of change in 
inclusive values, the more poorly conditioned (L'P)(p)^ is relative to I — fi(p). Because 
A(p) is diagonal, ||A(p)||i = ||A(p)||2 = ||A(p)||oo and thus the same bound applies for 
condition numbers in norms other than the 2-norm. 

Third, Eqn. (14) also provides bounds on the magnitude of values taken by rj: 

Lemma 2.10. Suppose Assumptions 2.1-2.3 hold. For all p € (0,?*)"^, rf satisfies 

i + lin(p)IU -'1"'''"°°- i-||f!(p)iu 

Proof. This follows immediately from Eqn. (14), using the triangle inequality. □ 

The upper bound suggests the following assumptions to ensure that r] itself is bounded: 
Assumption 2.4. Suppose there exist M G (0, oo) and e G (0, 1) such that 

(16) sup{||A(p)-ip(p)||oo : pG (0,?,)-^} = M<oo 

(17) sup{||n(p)||oo : pG (0,?,)-^} = l-e< 1. 

Under simple Logit, Pj^ {p) / \\k{v)\ = \{Dwk){pk)\~^ andn/(p) = IP^(p)^. Thus Eqn. 
(16) is akin to concavity of Wk for all sufficiently large p^, and Eqn. (17) is implied by 
i9 > —CO, i.e. the existence of an outside good with positive purchase probability. 

Lemma 2.11. Suppose Assumptions 2.1-2.3 hold. 

(i) If Assumption 2.4 also holds, N = sup{||?7(p)||oo : p G (0,^*)"^} < oo. 

(ii) Moreover Eqn. (16) in Assumption 2.4 is necessary for N to be finite. 

Unfortunately some simple models do not satisfy Assumption 2.4. A simple Logit model 
with w{p) = —alogp for some a > violates Eqn. (16). More generally, the Boyd and 
Mellman (1980) model of Example 1 does not satisfy Eqn. (16). This is most easily seen 
by noting that finite-sample approximations to this model have 



lim ||A(p)-ip(p)||oo = max | — I 

5fc-!>00 s = l,...,S ICts ) 



Pk^oo 

where {as}f^^ are the sampled price coefficients. Of course, as S ^ oo, mins=i^...^5{Qs} 
0, and thus | |A(p)~^P(p)| |oo oo. 
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2.8. The (^-Markup function. Substituting Eqn. (9) into Eqn. (7) yields the ^-markup 
equation introduced in Morrow and Skerlos (2010): 

(18) P = c + C(p) where C(p) = A(p)-if(p)^(p - c) - A(p)-ip(p) 

when A(p) is nonsingular (Section 2.6, 2.8). Thus the (^-markup equation is specific to 
Mixed Logit models, unlike the BLP-markup equation. 
We observe a relationship between the maps C and r/. 

Proposition 2.12. Suppose Assumption 2.1-2.3 hold. For any p G (0,?*)'', C(p) = 
n(p)(p-c) + (I-n(p))r,(p). 

Proof. This follows directly from Eqns. (14) and (18). □ 

In so far as rj and C, are distinct maps, they can generate numerical methods for the 
computation of equilibrium prices with entirely different properties. The equation above 
implies that C(p) = ''?(?) i^j a-^d only if, p — c — ?7(p) = p — c — C(p) liss in the null space 
of ri(p). Thus if r2(p) is full-rank, C, and rj coincide only at simultaneously stationary 
prices. Simple examples of Mixed Logit models can be constructed that always have 
rank(f2(p)) = J. For Logit, ri/(p) = IPj(p)^ for all / and f2(p) always has rank 
F < J. However the analysis in Morrow and Skerlos (2008) can be used to show that C 
and T] coincide only at simultaneously stationary prices. 

We now explore ^'s asymptotic properties. 

Lemma 2.13. Under Assumption 2.4 ||C(p)l|oo < ||p — c||oo whenever ||p — c||oo > M/e. 
Moreover ||p — c||oo — ||C(p)l|oo oo as Hp — cHoo—s-oo. 

Proof. We simply note that 

||C(p)||oc < ||^^(p)||oo||p - Clloo + ||A(p)-lp(p)||oo 

< (l-e)||p-c|U+M 

< Hp - c||oo - (e||p - c||oo - m) 
M 



< 



l-e + 



IP 



IIP - c||c 

Now if Hp - c||oo > M/e, then M/||p - c||oo < e. Thus 

IIC(p)l|oo < [l-e + e]||p-c||oo = ||p-c| 
To prove that ||p — c||oo — ||C(p)l|oo — ^ oo, note that 

Hp -c| loo - HC(p)Hoo > (£ - 71 — ^ II 1 Hp 



IP - c| 



oo 



For all Hp — cHoo > M/e, the term in parentheses is positive. Furthermore, this term 
approaches e as Hp — cHoo — oo. Thus Hp — cHoo — HC(p)H oo ^ oo as Hp c h oo ^ oo . en 

A slightly different assumption concerning ft{p) is useful when analyzing the ^ map. 
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Assumption 2.5. Suppose 

(19) sup{||n(p)(p-c)||oo :pG (0,<^*)-^} <oo, 

Lemma 2.14. Suppose Assumption 2.1-2.3 holds and = oo. Then C, is hounded if, and 
only if, Eqn. (16) and Eqn. (19) hold. 

Proof. This follows directly from the triangle inequality and the non- negativity of r2(p)(p— 
c) and — A(p)~-'^P(p) for ah p > c. □ 

For future reference, we prove that Eqn. (19) strengthens Eqn. (17). 

Lemma 2.15. If Eqn. (19) holds, then Eqn. (17) holds. 

Proof. Note that Eqn. (19) implies that for any k, 

^lirn^ {'^k,j{v){Pj - Cj)) < oo for all j G Jj(fc). 

This, in turn, implies that ujkj{p) — )• as pj — )■ oo. 

Now Eqn. (17) fails only if limp_>.q||r2(p)||oo = 1 where q has some qj = oo. But the 
row sums of ft{p) satisfy 



lim 



^fcj(q) < 1- 



Thus if Eqn. (19) holds, Eqn. (17) cannot fail. □ 



jGj/(fc)U{i:gj<oo} 



2.9. Existence of Simultaneously Stationary Prices. This section provides two ex- 
istence results. Neither establish the existence of a local equilibrium, or the uniqueness of 
simultaneously stationary points. To address the existence of local equilibrium will require 
additional conditions to ensure that each firm's profits are locally concave at the simulta- 
neously stationary prices whose existence can be ensured (Morrow, 2008). Little is known 
about how to address the uniqueness of simultaneously stationary points. Indeed, Mor- 
row and Skerlos (2010) provide an example of a Mixed Logit model with 9 simultaneously 
stationary prices, 4 of which are local equilibria and 2 of which are proper equilibria. 

Assumption 2.4 ensures the existence of finite simultaneously stationary prices when 

= oo. 

Corollary 2.16. Suppose = oo and Assumptions 2.1-2.3, 2.4 hold. Then there exists 
at least one vector of finite simultaneously stationary prices. 

Proof. This is a direct consequence of Brouwer's fixed-point theorem. c-\-r]{-) is a continu- 
ous map that takes the compact, convex set [0, M/eY into itself, and thus there is at least 
one fixed-point p = c + T7(p) € [0, M/eY . □ 

To apply Corollary 2.16 to cases when < oo, r/ must be extended from (0,?*) to 
all of preserving the bounds (15). This is easy for many of the simulation-based 
approximations encountered in practice, but difficult for the general case. 

We can extend this existence result using Eqn. (22) and the C, map. 
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Lemma 2.17. Suppose = oo, Assumptions 2.1-2.3, Eqn. (19), and Eqn. (22) hold. 
Then there exists some > such that Pk — — Cfc(p) > f^f o-ll P with > qu- 

Proof. The assumed bound implies 



< L <oo 



for any k and any p G (0, oo)"^. Consider 

Pk-Ck- Cfc(p) = {Pk -Ck)- V uJk,j{p){Pj -Cj) + Y^j^ 

Pkip) 



1 



If Eqn. (22) holds, then 



< lim 



|Aa:(p)| {Pk - Ck) 
Pkip. 



] iPk-Ck)- Y UJkj{p){pj - Cj) 



<5<l. 



|Afc(p)| {pk - Ck) 

Thus for any e > 0, there exists some Pfc > and A(p) with |A(p)| < e such that 

Pk{p) 



Afc(p)fe - Ck) 



< 5 + A(p) for all pk > Pk- 



Thus 



Pfc - Cfc - Cfc(p) > (1 - + A(p))(pA; - Cfc) - L for all Pk > Pk- 
In particular, if we choose e < {1 — 6)/2 we have 



Pk-Ck- Cfc(p) > 



1 - 5 



{Pk - Ck) - L 



1-6 



Pk - Ck 



2L 
1-6 



> 



for all Pk > qk = max{cfc + 2L/ (1 - 6),pk]. 

One consequence of this lemma is that infinite prices are never an equilibrium. 



□ 



Corollary 2.18. Under the assumptions of Lemma 2.17, any profit derivative is eventually 
negative. 

Proof. Note that 

Pfcvr/(fc))(p) = - |Afc(p)| {pk - Ck- Cfc(p))- 

Since Pk — Ck — Cfc(p) is positive for all large enough pk, {DkTTf(^k)){p) is negative for all 
large enough pk, regardless of p_fe. □ 

Another consequence of Lemma 2.17 is an alternative existence result. 

Corollary 2.19. Under the assumptions of Lemma 2.17 there exists at least one simulta- 
neously stationary point. 



22 



W. ROSS MORROW AND STEVEN J. SKERLOS 



Proof. Following Morrow and Skerlos (2008), we prove this proposition using the Poincare- 
Hopf Theorem (Milnor, 1965). The logic is simple: We will consider the vector field 
p — c — C(p) on a hyper-rectangle [c, q] whose critical points are simultaneously stationary; 
q has components qk defined in Lemma 2.17. The Poincare-Hopf Theorem then states that 
the sum of the indices of all the critical points of this vector field equals one, the Euler 
characteristic of [c,q]. Thus it is not possible that the vector field have no critical points, 
for then the sum of indices would be zero. 

We must only prove one property of p — c — C(p): that this vector field points outward 
on the boundary of the chosen hyper-rectangle. Half of this proof is Lemma 2.17, in which 
we prove that Pk — Ck ~ Cfe(p) > if p G [c, q] with pk = qk- We must also show that 
Pk- Ck- Cfc(p) < if p G [c, q] with pk = Ck- But 



This proof does not need to make any claims about the number of critical points, or of 
their indices. If it can be shown that any critical point of p — c — C(p) cannot have a zero 
or negative index, then the simultaneously stationary point is unique. 



This section provides details for the four approaches to computing equilibrium prices 
described in Morrow and Skerlos (2010); see Table 4. Section 3.1 briefly reviews Newton's 
method, followed by application of Newton's method to solve Eqn. (7) in Section 3.2. 
Newton's method applied directly to Eqn. (7) may compute "spurious" solutions with 
infinite prices because the combined gradient vanishes as prices increase without bound. 
Section 3.3 avoids this difficulty by applying Newton's method to the two markup equations 
instead of Eqn. (7) itself. Section 3.4 discusses fixed-point iterations based on the BLP- 
and ^-markup equations, and Section 3.5 reviews a number of practical considerations. 

3.1. Newton's Method. Newton's method, a classical technique to compute a zero of 
an arbitrary function F : R*^ — M*^, is now a portfolio of related approaches to solve non- 
linear systems (Ortega and Rheinboldt, 1970; Kelley, 1995; Dennis and Schnabel, 1996; 
Judd, 1998; Kelley, 2003). Generally speaking, Newton-type methods are differentiated in 
two relatively independent directions: (i) the technique used to approximate the Jacobian 
matrices (-DF) and solve for the Newton step and (ii) the technique used to enforce conver- 
gence from arbitrary initial conditions. See Dennis and Schnabel (1996), Judd (1998), or 
Kelley (2003) for good treatments of these issues. Choosing the right variant of Newton's 
method determines the reliability and efficiency of equilibrium price computations. 

Problem formulation also determines the reliability and efficiency of equilibrium price 
computations using Newton's method. Scalings of the variables and function values are 
one prominent example of a problem transformation that improves the performance of 




□ 



3. Computational Methods 
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Table 4. Summary of the numerical methods examined in this article. 



Newton Methods (NM) 


Abbr. Method 


Section 


Advantage 


Our Experience"- 


CG-NM Solve F^(p) = (V7r)(p) = 
r7-NM Solve F^(p) = p- c- T7(p) = 
C-NM Solve F(^(p) = p - c - C(p) = 


3.2 
3.3 
3.3 


Coercive 
Coercive 


Unreliable, slow 
Reliable, slow 
Reliable, slow 


Fixed-Point Iterations (FPI) 








Abbr. Method 


Section 


Advantage 


Our Experience 


C-FPI Iterate p ^ c + C(p) 
T7-FPI Iterate p c + r/(p) 


3.4 
3.4 


Easy to evaluate 


Reliable, fast 
Not convergent 



(a) Conclusions on behavior of these methods is based on the numerical experiments 
described in Morrow and Skerlos (2010), using a novel GMRES-Newton method with 
Levenberg-Marquardt style trust-region global convergence strategy. 



Newton's method (Dennis and Schnabel, 1996). Nonlinear problem preconditioning can 
also be important (Cai and Keyes, 2002), as the following example demonstrates. 

Example 5. Lei F : be defined 6y F(x) = x/(l + ||x|||). Iterating Newton 

steps converges to the unique (finite) zero x^, = only from initial conditions xq with 
||xo||2 < l/v^- Newton's method diverges or fails for all other starting points. Standard 
global convergence strategies for Newton's method (line search, trust region methods) cannot 
improve this poor global convergence behavior because ||F(x)||2 has unbounded level sets; 
see Morrow and Skerlos (2010) for details. 

A simple nonlinear transformation overcomes this poor global convergence behavior. Note 
that F(x) = A(x)f(x) where A(x) = (1 + ||x||2)~"'^I and f(x) = x. Because A(x) is 
nonsingular for all x, the problems F(x) = and f(x) = have identical solution sets. 
However applying Newton's method to the problem f(x) = trivially converges in a single 
step from any initial condition without a global convergence strategy. 

Example 5 illustrates why computing equilibrium prices based on the markup equations 
is more reliable and efficient than using Eqn. (7) directly. The following two sections echo 
the pattern of this example to provide the details. 

3.2. Newton's Method on the Combined Gradient. The most direct approach to 
compute equilibrium prices using Newton's method is to solve F7r(p) = (V7r)(p) = 0, 
abbreviated CG-NM in Table 4. This approach works well when the initial condition is near 
an equilibrium, as required by theory (Ortega and Rheinboldt, 1970; Kelley, 1995; Dennis 
and Schnabel, 1996). In practice, computing counterfactual equilibrium prices starting with 
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the observed prices may exploit this local convergence if changes to exogeneous variables 
have a relatively small impact on equilibrium prices. On the other hand, CG-NM can be 
unreliable when started "far" from equilibrium. 

The challenge is the tendency for the derivatives of profits to vanish as prices become 
large Morrow and Skerlos (2010), as demonstrated in Example 6 below. 

Example 6. Consider a simple Logit model with linear in price utility and an outside good: 
uip) = —ap + V for some a > and any u S M, and "d > — oo. The derivative of firm f 's 
profit function with respect to the price of product k £ is 



Assumption 3.1. Let tpk be defined as in Assumption 2.3. Assume: (i) Pkipki^^Pk) — ^ 
as pk t ?(^) for f^-a.e. 0. (ii) There exists M < oo and pk G [0,?*) such that 
Pk f Tpk{0,Pk)dn{0) < M for allpk G (pfc,?*). 

As with Assumption 2.3 above, (i) and (ii) are essentially conditions for the Dominated 
Convergence Theorem. 

Assumption 3.1 (i) extends Assumption 2.3 (i) to include a neighborhood of <^^,. Note 
that if (^{6) < oo then (i) holds if, and only if, tpki^jPk) — ^ as pfc t ?(^); i-e- i^ki^,-) 
is continuous at ?(0). Thus if (^{6) < oo Assumption 3.1 (i) and Assumption 2.3 (i) are 
the same. If c;{0) = oo and Pk'>PkiG,Pk) — )■ as pk t c«, then necessarily ipk{0,Pk) — )■ as 
Pk t c«. The converse, however, need not hold. 

If < oo. Assumption 3.1 (ii) simply says that J il)k{d .,Pk)diJL{6) is bounded as pk t 
This is not implied by Assumption 2.3 (ii), but is a natural extension of it. 

Lemma 3.1. Suppose Assumptions 2.1-3.1 hold. Then pk \\k{v)\ — as Pfc t for all 
k. Additionally, pk |7j,fe(p)| — s- as Pfc t ^* for all j. Subsequently, {DkTrf(^k))ip) — as 
Pk t 

Proof. Let {p^"'*}neN C (0,?*) be any sequence converging to <;^,. Define "^^^^ : T ^ V hy 

~ Pfc"VA:(^;Pi"^)- The functions {^^""^jngN converge pointwise to zero and have 
integrals uniformly bounded by the constant M. By the Dominated Convergence Theorem 



In other words, under Assumption 3.1 the components of F^r vanish as the corresponding 
price tends to even though this may not mean that maximizes profits. Because of 
this, CG-NM may converge to a zero of at c^^l, or with some components equal to <;^=, 
that is not an equilibrium. 



{Dk7:f){p) = -aPt{v){pk - Ck) + aP,^(p)vr/(p) + P,^(p). 

Since P^{p) and P^{p){pk — Ck) both vanish as pk ^ oo (as is easily checked), 7r^(p) is 
bounded in p. Thus {DkTtf){p) — )• as pk ^ oo. 

We now provide a general assumption under which {DkTr f(^i^^){p) — )• as t 




□ 
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Note that even though the price derivatives vanish at infinity, this does not mean that 
infinite prices maximize profits. Nonetheless, CG-NM may converge to a zero of with 
some components equal to infinity that is not an equilibrium. Moreover, because the com- 
ponents of F,r(p) can vanish over some divergent sequences, standard global convergence 
strategies based on minimizing ||F^(p)||2 will not be effective ways of avoiding this behav- 
ior. As in Example 5, we must reformulate the problem to obtain reliable and efficient 
approaches for computing equilibrium prices. 

3.3. Newton's Method and the Markup Equations. Reliable and efficient implemen- 
tations of Newton's method are found by observing that the combined gradient, F^r, can 
be written as follows: 

(20) F,(p) = (OT)(p)X(p) where F^(p) = p - c - r?(p) 

(21) F,(p) = A(p)F^(p) where F^(p) = p - c - C(p). 

Either F^ or F^ can be used to compute simultaneously stationary prices when {DP){p)~^ 
and A(p), respectively, are nonsingular (Morrow and Skerlos, 2010). Of course, F^ and 
F^ recast the first-order condition as a fixed-point problem: F^ is zero if and only if the 
BLP-markup equation holds, and F^ is zero if and only if the ^-markup equation holds. 

Solving F,^(p) = or F^(p) = 0, abbreviated ry-NM and ^-NM respectively in Table 
4, requires the solution of nontrivial nonlinear systems with Newton's method. rj-NM and 
^-NM, however, are less likely to have the computational problems that CG-NM exhibits 
because they exploit norm-coercivity of the maps F^ and F^ (Morrow and Skerlos, 2010). 
A norm-coercive map has a norm that tends to infinity with the norm of its argument (Or- 
tega and Rheinboldt, 1970; Harker and Pang, 1990). Globally convergent implementations 
of Newton's method that decrease the value of ||F(p)||2 in each step produce bounded 
sequences of iterates when F is norm-coercive. Thus, solving the BLP- or (^-markup equa- 
tion instead of the literal first-order condition removes the tendency for applications of 
Newton's method to compute "spurious" solutions at infinity. 

We now prove that the maps F^ and F^ are indeed coercive. 

Lemma 3.2. Suppose = oo and Assumption 2.1-2.3 hold. 

(i) Norm-coercivity o/F^(p) implies that of F^{p). 

(ii) If Eqn. (17) holds, then norm-coercivity of Fjj{p) implies that o/F(^(p). 
Proof. Proposition 2.12 implies that 

P - c - C(P) = (I - ^{P)) (P - c - rjip))- 

To prove (i), note that 

||p-C-T7(p)||oo > ( } ) ||p-C-C(p)||oo > (l) I|P-C-C(P)||00- 

VI + ||s2(p)||oo/ v^y 

To prove (ii), note that if Eqn. (17) holds. 

Hp - c- C(p)l|oo > (1 - ll^^(p)l|oo)||p - c - ?7(p)||oo > e||p - c - 77(p)||oo. 

□ 
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Lemma 3.3. Suppose = oo and Assumption 2.1-2.3 and 2.4 hold. Then 
lim ||p-c-?7(p)||oo = oo= lim ||p - c - C(p)||oo- 

IIpIIoo-s-oo ||p||oo-*-oo 

Proof. The norm-coercivity of 77 is a trivial consequence of the boundedness of rj under 
Assumption 2.4. The norm-coercivity of C, then follows from Lemma 3.2. □ 

We now weaken Assumption 2.4's Eqn. (16). 

Assumption 3.2. Suppose that = 00 and 

|A(p)-ip(p)|U .„^w 



(22) 



lim sup 

Mtoo 



■ v(^v' Moo>M \ = 6e [0,1). 



llPlloo 

Note that the limit is of a non-increasing sequence of non-negative numbers, and thus 
exists. 

Lemma 3.4. Assuming Eqn. (22) is equivalent to assuming that for any sequence p.„ with 

llPnIloo 00, lim„^oo||A(p.„)"^P(p„)||oo/||Pn||oo < 5. 

Proof. If Eqn. (22) holds, then for any e > there exists an M > such that 

|A(p)-ip(p) 



sup 



iPl 



: P e ^ MIpIIoo >M}<6 + e 



If l|Pn 



oo, then there is also an A^^^ such that ||Pn||cx) ^ M for all n > A^. Thus 
||A(p„)-ip(p„) 



for all n > A^, and thus 



lim 

n— >oo 



1 1 Pn 1 1 oo 

|A(p^)-ip(p„) 

I |Pn| |oo 



<5 + e 



< 5. 



Conversely, if Eqn. (22) fails, then there is a M > such that 



S{M) = sup 



|A(P)-'P(P) 



:pe^%l|p||oo >M^ > 1 



for all M > M. We can thus choose Pm with ||pm||oo ^ M satisfying 
^ ||A(pm)~^P(pa/)||oo 

IIpa/IIoo 

In other words, 

||A(pm)"^P(pa/ 



.^.,r. ||A(P,,)-1P(PM)||00 ^ 1 



> 1 



|Pm| 



for all M > M, and thus 



lim 



|A(pm)-^P(pa/) 



IPmNoo 
1 

' M 
> 1. 



I|Pa/||oo 

Hence the "sequence version" of Eqn. (22) fails, and thus by contraposition the sequence 
version and Eqn. (22) are identical. □ 
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Next we note that Eqn. (22) weakens Eqn. (16). 
Lemma 3.5. If Eqn. (16) holds, then Eqn. (22) holds. 
Proof. If sup{||A(p)-ip(p)||oo : p G P-^} < M, then 



S{L) = sup 



|A(p)^^P(p) 

IIpIIoo 



:pG7'^,||p||oo < 



M 

T' 



□ 



Thus hmi_s.oo S{L) = 0, a special case of Eqn. (22). 

Now we prove the alternative coercivity result. 
Lemma 3.6. Suppose = oo and Assumptions 2.1-2.3, 2.5 and 3.2 hold. Then 
lim Hp - c - T7(p)||oo = oo = lim ||p - c - C(p)||oo- 

|p||oo-)-00 IIpIIoo— i-OO 

Proof. We prove the claim for ^; the result for rj then follows from Lemma 2.15. Note that 



Pk - Ck 



> 



1 



Pkjp) 

Pkhip, 

Pkip) 



Pk -Ck- ^ UJkj{p){pj - Cj 



Pk 



Pkhip 

Suppose that pk ^ oo. By assumption, 

Pkip 



lim 

n— >oo 



1 



PfcAfc(p) 

while the second term is bounded. Thus 



Ck+ J2 ^k,jip)iPj - Cj) 



> 1 - 5 > 



E/ \/ N Pkip) 

^^k,j ip) iPj - Cj) - Yj-- 



OO. 



□ 



Note that since we did not require that Eqn. (17) held, ^ need not be bounded for 
and to be coercive. 



3.4. Fixed-Point Iteration. In addition to applications of Newton's method, the BLP- 
and ^-markup equations suggest applying fixed-point iteration to solve for equilibrium 
prices. We examine fixed-point iterations based on both equations. 
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3.4.1. ( Fixed-Point Iteration. The fixed-point iteration p c + C(p) based on the ^- 
markup equation, here abbreviated ^-FPI, can efficiently compute equihbrium prices for 
some problems. ^-FPI has relatively efficient steps because no linear systems need to be 
solved, unlike every other method listed in Table 4. While we are not aware of a general 
convergence proof for (^-FPI, this iteration has converged reliably on test problems including 
the examples in Morrow and Skerlos (2010). 

The first observation we make is that the ^-FPI steps always point in directions of 
"myopic gradient ascent." 

Lemma 3.7. Let p G (0, and let Sp = c + (^(p) — p denote the C,-FPI step. Then 

1 ^ jV7r)(p)^^p ^ 1 



maXj|Aj(p)| (V7i-)(p)T(V7r)(p) minj | Aj(p)| ' 

Similarly, let 9{p) denote the angle between dp and (V7r)(p), and suppose p is not simul- 
taneously stationary. Then 

miuj |Aj(p)| 



cos0(p) > 



maxj |Aj(p 



Proof. Both results follows directly from the equation (V7r)(p) = |A(p)| dp where |A(p)| 
denotes the absolute value of the components of A(p). □ 

Specifically, the (^-FPI steps have a positive projection onto the combined gradient, and 
cannot become orthogonal to the combined gradient over any sequence of non-simultaneously 
stationary prices that stay in (O,";*)"^. 

If F = 1, and the equilibrium problem is an optimization problem, this implies (^-FPI 
has steps that point in gradient ascent directions and, when properly scaled, converge 
to local maximizers of profit. More specifically, (^-FPI cannot converge to minimizers of 
profits. This may generate the properties of (^-FPI observed in Example 10 from Morrow 
and Skerlos (2010). 

Corollary 3.8. Let Assumptions 2.1-2.4 hold, and suppose {p*-"'''}^^! is the C,-FPI se- 
quence. Then {p*-"^}^! is hounded. 

Proof. By Lemma 2.13, for any sufficiently large > we can find some L > such that 

||C(p)l|oo < Hp - c||oo - M for aU ||p-c||oo>L. 
If the ^-FPI sequence diverges, then for any such L there is an such that 

IIp^"^ -c||oo > L for all n>N. 

But then 

||p("+') - ciloo = ||C(p^"^)l|oo < ||p(") - c|U - M < ||p(") - c|U for aU n>N, 

which states that the ^-FPI sequence is decreasing. This is a contradiction of the hypothesis 
that the ^-FPI sequence diverges. □ 
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To implement (^-FPI, one simply needs to iterate the assignment p c + C(p) where 
Eqn. (18) defines C(p)- As shown in Table 5 below, integral approximations, rather than 
the actual computation of the step, drive the computational burden. Given a price vector, 
utilities, and utility derivatives, computing P(p), A(p), and r(p) for a set of S samples 
requires 0(5' ^j^;^ Jj) floating point operations (flops), while the fixed-point step itself 
only requires ©(X^jL^ Jj) flops. Note that computing the fixed-point step c+(^(p) requires 

an equivalent amount of work as computing the combined gradient (V-7r)(p). Furthermore, 
because A(p) is a diagonal matrix, no serious obstacles to computing the fixed point step 
arise as J becomes large. 

3.4.2. r] Fixed-Point Iteration. The fixed-point iteration p ^ c-|-?7(p), abbreviated 77-FPI, 
based on the BLP-markup equation need not converge. Example 7 below, repeated from 
Morrow and Skerlos (2010), gives a case in which rj can fail to be even locally convergent. 

Example 7. Consider multi-product monopoly pricing with a simple Logit model having 
^i(p) = —ap + Vj for some a > 0, any vj £ M, and > —00. It is well known that 
for a single-product firm, unique profit-maximizing prices exist (Anderson and de Palma, 
1988; Milgrom and Roberts, 1990; Caplin and Nalebuff, 1991). Morrow (2008) proves that 
profit- optimal prices p* are unique for the multi-product case — and even so with multiple 
firms — even though profits are not quasi-concave (Hanson and Martin, 1996). 

In this example, rj-FPI is not always locally convergent near p^, while C,-FPI is al- 
ways superlinearly locally convergent. For an arbitrary continuously differentiable function 
F and p* = F(p*), F is contractive on some neighborhood 0/ p* in some norm \\-\\ if 
yo((I?F)(p^,)) < 1 where p{A) (Ortega and Rheinboldt, 1970). We show that p{{Dr)){Tp^)) > 
1 may hold while p((-DC)(p*)) = 0, where p{A) denotes the spectral radius of the matrix 
A. 

The components of the BLP-markup function rj are given by %(p) = ct~"'^(l~S/=i -f/(p))' 
for all k. From this formula the equation 

^ 2-jj=i j=i 

can be derived. For valuations of the outside good, {}, sufficiently close to —00, p{{Dri){p^:)) > 
1 can hold; see Morrow and Skerlos (2010) for details. 

To prove the claim regarding p[{DC,){p*)), note that Cfc(p) = "^(p) + l/o, and thus 
(Aa)(p*) = (Avr)(p*) = for all k,l. 

Even if the BLP-markup equation does generate a convergent fixed-point iteration, eval- 
uating T] involves the solution of F linear systems that grow in size with the number of 
products offered by the firms. The work required to evaluate r/ using a direct method like 
PLU or QR factorization is 0{[maxf J/]^), given values of P(p), A(p), and r(p) as approx- 
imated using simulation. The work to evaluate ^ is only ©([maxj J/]^) given P(p), A(p), 
and r(p) (Table 5). Generally speaking, function evaluations must be cheap for the linear 
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convergence of fixed-point iterations to result in faster computations than the superlinearly 
or quadratically convergent variants of Newton's method. 

3.5. Practical Considerations. This section addresses several practical considerations. 

3.5.1. Simulation. Any method for computing equilibrium prices under Mixed Logit mod- 
els faces a common obstacle: the integrals that define the choice probabilities (P) and their 
derivatives (A, F) cannot be computed exactly. We employ finite-sample versions of the 
methods discussed below by drawing 5 G N samples from the demographic distribution and 
applying the method to the finite-sample model thus generated. Particularly, these samples 
are used to compute approximate P(p), A(p), and r(p); see Table 5. These samples are 
kept fixed for all steps of the method and, in principle, can be generated in any way. We 
draw directly from the demographic distribution, although importance and quasi-random 
sampling (e.g., see Train (2003)) can also be employed. The Law of Large Numbers mo- 
tivates this widely-used approach to econometric analysis (e.g., see McFadden (1989) and 
Draganska and Jain (2004)). While all numerical approaches for computing equilibrium 
prices described here rely on a Law of Large Numbers for simultaneously stationary prices, 
we do not provide a formal convergence theorem. We do provide numerical evidence that 
computed equilibrium prices based on the fixed-point iteration for our examples do indeed 
follow such a law. 

3.5.2. Truncation of Low Purchase Probability Products. All of the methods we implement 
can be built to ignore products with excessively low choice probabilities. That is, one 
can ignore price updates for all products with Pj{p) < ep, where ep is some small value 
(say 10~^°). Products with a choice probability this small (or smaller) need not be con- 
sidered a part of the market in the price equilibrium computations. For example. Wards 
(2007) reports total sales of cars and light trucks during 2005 as = 16, 947, 754. Partic- 
ularly, 7,667,066 cars and 9,280,688 light trucks. Because expected demand is defined by 
E[(5j(p)] = NPj{p), any ep < 0.5 * 3 x 10~® ignores any vehicle that, as priced, 
is not expected to have a single customer out of the millions of customers that bought or 
considered buying new vehicles. There are also technical reasons for this truncation. Par- 
ticularly, A(p) and (L'V7r)(p) become singular as Pj{p) — )• 0, for any j. Truncating avoids 
this non-singularity and hopefully helps conditioning. 

3.5.3. Termination Conditions. We terminate all iterations with the numerical simultane- 
ous stationarity condition ||(V7r)(p)||oo < where St is some small number (e.g., 10~^). 
Note that a standard application of Newton's method to solve Fjj(p) = or F^(p) = 
would terminate when either 

(23) Hp - c - r/(p)||oo < er or ||p - c - C(p)||oo < er, 

respectively. For example, Aguirregabiria and Vicentini (2006) use the condition ||p — 
c — ''?(p)||oo < Et- Ensuring that Eqn. (23) holds does not necessarily imply that 
||(V7r)(p)||oo < et, the strictly interpreted first-order condition. 
Because 

(i5P)(p)^(p - c - r,{p)) = (V7r)(p) = A(p)(p - c - C(p)), 
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it is easy to terminate all methods, CG-NM, J7-NM, C-NM, and C-FPI, when ||(V7r)(p)||oo < 
£t- While this is done here to ensure consistency in our comparisons of different meth- 
ods, ||(V7r)(p)||oo < £t should always be the termination condition for price equilibrium 
computations. 

Three other standard termination conditions are used (Brown and Saad, 1990; Dennis 
and Schnabel, 1996). We terminate the iteration if the (relative) step length becomes too 
small, if a maximum number of iterations is exceeded, or if an exceptional event occurs 
(e.g. division by zero). These three conditions are considered "failure" as the iteration has 
failed to compute a numerically simultaneously stationary point in the sense of the first 
termination condition. 

3.5.4. Second-Order Conditions. Each method in Table 4 finds simultaneously stationary 
points, rather than local equilibria. Unlike in optimization, there is no a priori assurance 
that first-order iterative methods for equilibrium problems will converge to certain types 
of stationary points. Thus in computing equilibria it is vitally important to check the 
second-order sufficient conditions to verify that a local equilibrium has indeed been found. 

In local equilibrium every firm's profit Hessian, (DjV/7r/)(p), should also be nega- 
tive definite. The formulas given in Proposition 3.9 below provide an expression for 
{DfV fTTf){p) that we use to check the second-order sufficient condition. Cholesky fac- 
torization, rather than direct approximation of the spectrum, is used to test the negative 
definiteness of (DjV fKf){p) (Golub and Loan, 1996). 

3.5.5. Computational Burden. Table 5 reviews the formulae and computational burden of 
computing (Vtt), t], and C- 

Computing t] and applying Newton's method to requires solving linear systems. We 
give some more details regarding these computations here. As stated above, the linear 
system 

(I-n(p))r7(p) = -A(p)-ip(p) 
should be used to solve for r]{p). Note also that only the systems 

(I-n/(p))r,^(p) = -A/(p)-iP/(p) 
for all / need be solved. Of course, our condition bound applies within firms as well: 

.2mPf)ip) )>^— ^— -p^jK,(l-f^,(p))- 

If Householder QR factorization is used to solve these systems, then computing 17 (p) from 
P(p), A(p), and f (p) requires C'(X;/=i Jj) flops (Table 5). 

This is a significant increase in computational effort relative to computing C(p) or 
(V7r)(p). The diagonal dominance of I — r2(p), indeed of (DP)(p) itself, suggests that 
Jacobi, Gauss-Seidel, and Successive Over-Relaxation (SOR) iterations (Golub and Loan, 
1996) may be a relatively efficient way to compute rj. 

Additional work is required to compute {Dr]){p), if this is to be used in Newton's method. 
Though it requires solving a matrix-linear system of the type {DP){p){Dri){p) = B(p), 
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Table 5. Work required to evaluate (Vtt), rj, and C given S sam- 
ples {0s}f^]^ C T, an S X J matrix L(p) of Logit choice probabilities 
((L(p))s j = Pj'{ds,p)), and an S x J matrix of utility derivatives D(p) 
((D(p))sj = {Dwj){d s^pj)). The first section gives work required for 
sample-average approximations to P(p), A(p), and r(p). The second sec- 
tion takes P(p), A(p), and r(p) as given. 



Quantity Formula flops 



P(p) S'-iL(p)Tl SJ 

V(p) L(p)-D(p)('^) SJ 

A(p) 5'-iV(p)^l SJ 

f(p) 5-iL(p)TV(p) 25 Ell 



Total work to compute P(p), A(p), and f (p) S" ( 3 J + 2 Yff=i J} 



C(P) fj(p)(p-c)-A(p)-ip(p) 2E/=i^|+4J 

r,(p) (I - J^(p))77(p) = -A(p)-iP(p) (I) Ef=i 4 + (i) 4 + - 2 

(W)(p) (A(p)-f(p)T)(p-c) + P(p) 2ELi^| + 5J 

= A(p)(p-c-C(p)) 2eLj| + 6,/ 



(a) "•" here denotes element-by-element multiplication. 



the required matrix factorizations of I — ^^(p) need only be computed once to compute 
both rj and {Dij), but must be updated for each vector of prices. 

3.6. Computing Jacobian Matrices for Newton's Method. Standard "exact" or 
Quasi-Newton methods to solve F(x) = either always or periodically require the Jacobian 
matrix (Z)F)(x). Using finite differences to approximate Jacobian matrices requires J 
evaluations of the function F, an unacceptable workload. In the 993 vehicle example from 
Morrow and Skerlos (2010), approximating (DF)(x) once with finite differences would take 
roughly 993 evaluations of F, when the work of less than 50 evaluations appears to sufficient 
to converge to equilibrium prices using the ^-FPI. 

We recommend directly approximating (DF)(x) using integral expressions for (Z)V7r)(p), 
{Dr])(p), and {D(^){p) provided below. An alternative is to use automatic differentiation, 
but we are skeptical that this would in fact be faster than the direct formulae provided 
here. 
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3.6.1. Jacobian of the Combined Gradient. Assuming a second application of the Leibniz 
Rule holds, we can derive integral expressions for the second derivatives (L'il?fe7rj(fc))(p) 
through 

{{DV7T){p))^^ = (Ai?fe^/(fc))(p) = I {DiD,iTj^,^){e,p)dfi{0). 

Proposition 3.9. Let w be twice continuously dijferentiable in p and suppose a second 
application of the Leibniz Rule holds for the Mixed Logit choice probabilities at p. Set 



V^fc,Kp) = / {Dwk){e,pk)Pt{e,v)^'i,k.{e,p)Pt{e,^){Dwi){e,pi)d^ji{e) 



Mv) = / {Dwk){e,pk)Pt{o,v)Pr{d.v){Dwi){9,pi)dti{e) 
Xfc(p) = Q) I {{D^Wk){e,pk) + {Dwk){e,pkf) 

X P,^(0,p)((pfc -cfc) -7r^(,)(0,p))d/.(0) 

(i) Component form: Setting 

6,/(p) = 4,/(Afc(p) + Xfc(p)) - 7fc,;(p) - (Pk - Ck)^k,i{p) 

we have 

(AA'Vr/(fc))(p) = Cfe,«(p) + 2V'fe,«(p) + '5/(fe),/(/)6,fc(p) 

(ii) Matrix form: Let ^{p), ^(p) and X(p) = diag(x(p)) be the matrices of these 
quantities. Also set 

H(p) = A(p) - r(p) - diag(p - c)*(p) + X(p). 

and 

'^kAP) ^ff{k) = f{l) 



(3(p))a 

Then 



^ffik)^f{l) 



(24) pV7r)(p) = H(p) + 2*(p) + H(p)T. 

Proof. To see that this only relies on a second application of the Leibniz Rule to the choice 
probabilities, note that 

(AI?fevrf(,.))(p) = {DiDkPj){p){p, - c,) + 6j^k),fii){DkPi){p) + (APfc)(p) 

and thus the continuous second-order differentiability of ttj (p) depends only on the second- 
order continuous differentiability of P/ . This result is then an immediate consequence of 
the validity of the Leibniz Rule, if a bit tedious to derive. □ 

The validity of a second application of the Leibniz Rule to the choice probabilities is 
ensured by the following condition. 

Proposition 3.10. Let {u,i!),fi) = {w + v,'d,fx) be such that 
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(i) t(;(0,y, •) : (0, — )• M is twice continuously dijjerentiable for all y & y and ^-a.e. 
OgT 

(ii) for aU{y,p)eyx{0,^,), \(D^w){.,y,q) + {Dw){-,y,qf\e<'y'''^~^(-^ : T ^ [0,oo) 
is uniformly ^-integrahle for all q in some neighborhood of p. 

(iii) for all (y,p), (y',p') G 3^ x (0,?,), 

I y,g)|e"(-'y''?)-''(-)e<'y''''')-'''(-) \{Dw){-,y' ,q')\ : T ^ [0,oo) 

is uniformly ^-integrahle for all {q,q') in some neighborhood of {p,p')- 

Then a second application of the Leibniz Rule holds for the Mixed Logit choice probabilities, 
which are also continuously differentiable on (0,?*1). 

This is proved in the same manner as Proposition 2.4. 
We also observe the following. 

Proposition 3.11. //Pfc(p) = then (A^fcvrj(fc))(p) = {DkDiTtfi^i)){p) = for all I G 
N(J). 

The proof follows from the derivative formulae given above. Of course, if -Pfc(p) = then 
(-Dfc-n"j(fc))(p) = as well and we have the following situation: (i) the Newton system is 
consistent for any s^ (p) G R and (ii) sj^{p) does not depend on s^ (p) for all I G N(J)\{/c}. 
Thus, in practice one can restrict attention to the Newton step defined by the submatrix 
of (-DV7r)(p) formed by rows and columns indexed by {j : Pj{p) > ep}. 

The formulae above give the following expression of the profit Hessians. 

Corollary 3.12. Let w be twice continuously differentiable in p and suppose a second 
application of the Leibniz Rule holds for the Mixed Logit choice probabilities. Firm f 's 
profit Hessian is given by 

{DfVffff){p) = Hjj(p) + 2^fj{p) + Hjj(p)^. 

3.6.2. The ij map. For F^(p) = p — c — r]{p), we have (Z)F^)(p) = I — {Dri){p) where 
{Dri){p) solves the linear matrix equation 

(OT)(p)T(D^)(p) = -(A(p) + (OT)(p)). 
Here (A(p))fc^/ = YlijeJnk){^i^kPj){p)'nj{V')- This is easily derived from the defining 
formula (I)P)(p)'^T7(p) = -P(p). 

3.6.3. The C map. For F^(p) = p - c - C(p), we have (L>Ff)(p) = I - (DC)(p) where 
(-DC)(p) can be computed using the following formula: 

(DiCk) = K 

+ Ck(t>k,i + ik,i + Sf{k)j{i)4>k,iipi - q) + ^f{k),f{i)ii,k - '2ipk,i ■ 



I Pt{{D^Wk) + {Dwkf) 



FIXED-POINT APPROACHES TO COMPUTING BERTRAND-NASH EQUILIBRIUM PRICES 35 



4. The GMRES-Newton Hookstep Method 

In this section we provide some details regarding the GMRES-Newton Hookstep method 
employed in Morrow and Skerlos (2010). For complete details, see ?. 

4.1. Inexact Newton Methods. A strong theory of "Inexact" Newton methods exists for 
the solution of systems of nonlinear equations when there are "many" variables. Inexact 
Newton steps are simply "inexact" solutions to the Newton system; that is, an inexact 
Newton step s^''^ is any vector that satisfies 

(25) ||F(x) + (OT)(x)s^^||<5||F(x)|| 

for some fixed 6 € (0, 1) (Dembo et al., 1982; Brown and Saad, 1990; Eisenstat and Walker, 
1994, 1996; Pernice and Walker, 1998). The name "truncated" Newton method has also 
been used for the specific case when the inexactness comes from the use of iterative linear 
system solvers hke GMRES (Saad and Schultz, 1986; Walker, 1988) or BiCGSTAB (van der 
Vorst, 1992; Sleijpen and Fokkema, 1993). We focus on GMRES, a particularly simple yet 
strong iterative method for general linear systems that has been consistently used in the 
context of solving nonlinear systems (Brown and Saad, 1990). 

By appropriately choosing a sequence of 5's, the local asymptotic convergence rate of 
an inexact Newton's method can be fully quadratic (Dembo et al., 1982; Eisenstat and 
Walker, 1994). Of course, taking — )• to achieve the quadratic convergence rate will 
also require increasingly burdensome computations of inexact Newton steps that satisfy 
increasingly strict inexact Newton conditions. On the other hand, 5 can be chosen to be 
a constant if a linear locally asymptotic convergence rate is suitable (Pernice and Walker, 
1998). 

Generally speaking there are three reasons to adopt the inexact perspective. First, direct 
methods like QR factorization may not be the most effective means to solve the Newton 
system when this system is large, because of computational burden and accumulation of 
roundoff errors. Instead, iterative solution methods are often used to solve linear systems 
with many variables; see, e.g. Trefethen and Bau (1997). Second, iterative methods like 
GMRES require only matrix- vector products (-DF)(p)s that can be approximated with finite 
directional derivatives (Brown and Saad, 1990; Pernice and Walker, 1998). Thus inexact 
Newton's methods can be "matrix- free" ; see Section 4.3.4 below. Third, Newton steps 
often point in inaccurate directions when far from a solution (Pernice and Walker, 1998). 
Thus solving for exact Newton steps may involve wasted effort, especially when there are 
many variables. 

matlab's fsolve function implements a related approach using the (preconditioned) 
Conjugate Gradient (CG) method applied to the normal equation for the Newton system, 
(DF)(p)'''(I?F)(p)s^^ = — (DF)(p)^F(p). Use of the normal equations is required be- 
cause CG is applicable only to symmetric systems (Trefethen and Bau, 1997). Note that 
this requires that the Jacobian (DF) is explicitly available. Although this holds for price 
equilibrium problems under Mixed Logit models, it can be a significant restriction for gen- 
eral problems. By requiring products {DF){p)~^h in each step of the iterative linear solver, 
this approach also increases the work by 0{NJ'^) flops where the solver takes N steps. 
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Finally, this approach can also be less accurate: using the normal equation squares the 
linear problem's condition number, and thus risks serious degradation in solution quality 
(Trefethen and Bau, 1997). Pernice and Walker (1998) describe a similar approach using 
BiCGSTAB: the extension of CG to non-symmetric systems. 

4.2. GMRES. The "Generahzed Minimum Residuals" or GMRES method (Saad and Schultz, 
1986) solves a linear system Ax = b by using the Arnoldi process to compute an orthonor- 
mal basis of the successive Krylov subspaces /C*^"") and then takes approximate solutions 
from those subspaces having least squares residuals. See Trefethen and Bau (1997) for a 
good introduction to Krylov methods in general, including the Arnoldi process and GMRES. 
In the n*'^ stage, GMRES "factors" A as AQ(") = Q("+i)h(") where Q(") G M^^" is an 
orthonormal basis for /C^"), Q("+i) g m^x{"+i) is an orthonormal basis for /C("+i) D /C("), 
and H^*^) G ]k("+i)x'" upper-Hessenberg. Any vector x G /C-"") C can be written 
X = Q^^V for some y G M" and thus the least-squares residual problem becomes 

min ||As-b||2= min||AQ("V- blh = min I IH^^V - (Q^"^^ ^bl U. 

The orthonormal basis is typically chosen so that (Q'^"^^^)^b = (3ei for some /3 G M, 
and hence the GMRES solution x^") = Q'-^'-'y where y solves minqgiRn ||HWy-/3ei||2. This 
least squares problem can be solved using the QR factorization of H^") . Furthermore this 
factorization can be efficiently updated in each iteration, instead of computed from scratch. 
Moreover the actual solution vector need not be formed until the residual is suitably small. 



4.2.1. Householder GMRES. We have implemented a variant of GMRES based on House- 
holder transformations due to Walker (1988); this is also the version implemented in 
matlab's gmres code. We have verified that our implementation generates results matching 
matlab's implementation. In this version of the GMRES process applied to the generic prob- 
lem Ax = b. Householder reflectors P*-"-* G M^^^ are used to generate the orthonormal 
matrices 



Q(n) _ p{l) . . . p(n) 



G 



nNxn 



G 



~n)xn 



satisfying 

where H^") G M^""" 



^Q(n) _ p(l) . . . p(n+l) jj(n) _ Q(n+1) jj(n) 



IS 







for upper Hessenberg H^") G M{"+i)x" and G ^(N-n-i)xn_ p{i) chosen to satisfy 
p(^)b = — /3ei where /3 = sign(6i)| |b| I2, and hence (Q("))^b = — /3ei. The n^^ approximate 
solution x^") is taken to be x^") = Q(")y{") where y^") G solves 

min||H(")y-/3ei||2. 
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Again these problems can be solved cheaply by updating QR factorizations with Givens 
rotations. Neither the solution vector nor the residual vector be formed until GMRES con- 
verges. An efficient implementation requires 0{Jn) flops and a matrix multiply in the 
n iteration, so that taking iterations requires 0{JN^) of "overhead" in addition to 
the 0{NJ^) work required for the matrix multiplications (using the actual Jacobians). So 
long as N < J, using GMRES with the actual Jacobians is cheaper than solving for the 
actual Jacobian with QR. With small A^, as we achieve using rj and the savings is quite 
substantial. 

We note the following formulae specific to the Newton system case. For A = (DF)(x) 
and b = -F(x), /3 = -sign(Fi(x))||F(x)||2 and -pei = P^h = -P(i)F(x) so that 

pWf(x) = /3ei = -sign(Fi(x))||F(x)||2ei. 

Moreover, P^^^ei = ei for all n > 1 so that 

(QW)TF(x) = -sign(Fi(x))||F(x)||2ei. 

4.2.2. Preconditioning. As is well known, preconditioning is key to the effectiveness of 
iterative linear solvers; see Golub and Loan (1996). We have not found the linear systems 
in J7-NM or ^-NM to need preconditioning. However we have found the preconditioned 
system 

(26) A(p)-i(Z?W)(p)s^^ = -A(p)-i(V7r)(p) = c + C(p) - P 

to be very necessary for rapid solution of the Newton system in CG-NM. This precondi- 
tioner is motivated by the following relationship of the Jacobian of (Vvr) to the Jacobian 
of C in equilibrium. 

Lemma 4.1. I — {D(^){p) = A{p)~^ {DV Tr){p) for any simultaneously stationary p. 

Proof. This follows from differentiating (V7r)(p) = A(p)(p — c — <^(p)) via the product rule, 
recognizing that p — c — C(p) = in equilibrium and D[p — c — C(p)] = I ~ (^C)(p)- ^ 

In other words, Newton's methods applied to F7r(p) preconditioned as above ends up 
being essentially the same iteration as F^(p), close enough to equilibrium. 

GMRES, if used successfully on this preconditioned system Eqn. (26), will ensure that 

(27) ||A(p)-i(V7r)(p)+A(p)-i(i^V7r)(p)s^^|| <5'||A(p)-i(W)(p)|| 

for some 6'. This is distinct from the inexact Newton condition Eqn. (25). The following 
proposition gives modified tolerances for the preconditioned system to ensure satisfaction 
of the original system. 

Proposition 4.2. Let 6 > be given. If Eqn. (27) is satisfied with 6'{p,6) < 6 given by 

(28) y (p. i) = ( ^V*Mk^ ] 

then Eqn. (25) is satisfied. 

This is a consequence of the following general result, which we state without proof. 
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Lemma 4.3. Let b G 

(29) 



and A, M G 
llAx- b| 



|b| 



< a 



be nonsingular. Then 
Ax - M^^bl 
l|M-ib|| 



where a £ [1, k(M)] is given by 

llM||||M-ib| 



a 



IMI 



M 



This implies that 



|Ax-b| 



< 5 



if 



|M-^Ax- M-^bl 
l|M-ib|| 



6 

< -. 

a 



Note that the preconditioned system must always be solved to a stricter tolerance than is 
desired for the un-preconditioned system using this bound. Additionally, computing a for 
a generic preconditioner M relies on the ability to compute ||M||. 

Eqn. (28) simply adopts the 2-norm and applies the formula (Golub and Loan, 1996) 



|A(p) 



max{| Aj(p) 



max{|Aj(p)|} 



Eqn. (29) also implies that if Eqn. (27) holds with 6' > 0, then 

||(V7r)(p) + (ZJVvr)(p)s^^||2 ^ ,., 

- < '^2(A(p))5 

ll(V7r)(p)||2 

where K2(A(p)) = ||A(p)||2||A(p)~-'^||2 is the (2-norm) condition number of A(p). This 
equation, while the more compact representation, can also be overly conservative as clearly 
illustrated in Fig. 1. It is unlikely that k(A(p)) is a tight upper bound on the multiplier 
in Eqn. (28). In fact, the multiplier on 5 depends only on the norm of A(p)~^x at a single 
point on the surface of the unit sphere in M"^ rather than ||A(p)~^||2, the maximum norm 
of A(p)~^x over this entire sphere. Our examples in Fig. 1 bear this out, having condition 
numbers many orders of magnitude larger than the multiplier in Eqn. (28). 

The power of the preconditioning is that the preconditioned system Eqn. (27) appears 
to be solved to a relative error of d'{p, 5) much faster than the original system can be solved 
to a relative error of 5, even though 6'{p,6) < 6. As can be seen in Fig. 1, solving the 
preconditioned system to 6'{p,5) can achieve a relative error in the original system below 
6 = 10"^*^ in roughly four orders of magnitude fewer iterations than solving the original 
system to this same relative error for prices near equilibrium. Away from equilibrium, 
GMRES may not be able to solve the original system to small relative errors like 10~^ at all. 
Thus using the original system would appear to slow, if not halt, an implementation of the 
inexact Newton's method. 

4.3. The GMRES Hookstep. Suitable modifications of each of the globalization strategies 
originally developed for "exact" Newton methods can be applied in the inexact context. 
Brown and Saad (1990) directly extend line search and a dogleg steps to GMRES-Newton 
methods. Eisenstat and Walker (1996) and Pernice and Walker (1998) apply a safeguarded 
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Figure 1. Relative error in computed solutions to the CG-NM Newton 
system and its preconditioned form using GMRES in the vehicle example from 
Morrow and Skerlos (2010) using the Berry et al. (1995) model. On the top, 
prices are p = p* + lOOv where p* are equilibrium prices and v S [—1,1] 
is a sample from a uniformly distributed random vector. For this case 
k(A(p)) = 1.56 X 10^^ while the multiplier in Eqn. (29) is only 106.41. 
On the bottom, prices are p = 20, OOOi' + 5, 000 where i/ is a sample from 
a random vector uniformly distributed on [0, 1]. For this case k(A(p)) = 
4.6 X 10^ while the multiplier in Eqn. (29) is only 10.73. Abbreviations are 
as follows. REL: relative error in the Newton System; PREL: relative error 
in the pre-conditioned Newton System; OBREL: our bound, Eqn. (29), on 
the relative error in the Newton System as determined from the relative 
error in the preconditioned Newton system; CNBREL: condition number 
bound on the relative error in the Newton System as determined from the 
relative error in the preconditioned Newton system. 
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backtracking line search to facilitate global convergence. More recently, Pawlowski et al. 
(2006, 2008) have studied dogleg steps suitable for GMRES-Newton methods in some detail. 
Finally Viswanath (2007) has derived an elegant version of the hookstep method suitable for 
GMRES-Newton methods. In contrast with the hookstep approach for the "exact" Newton 
method with Jacobian (DF)(p), Viswanath's approach requires computing the SVD only 
of a matrix whose size is determined by the number of iterations taken by GMRES. For 
reasonable applications of GMRES, this can be far less than the size of (Z)F)(p) itself. For 
the examples in Morrow and Skerlos (2010), the size difference is roughly two orders of 
magnitude: the GMRES-Newton hookstep worked with roughly 10 x 10 instead of 1,000 x 
1,000 matrices. Thus, the GMRES-hookstep can accumulate a tremendous savings over an 
exact-Newton implementation of the hookstep method. Again, each of these approaches 
iterates until an acceptable step is found, and can, in principle, involve many additional 
evaluations of F or fail to find an acceptable step altogether. 

Here we describe an implementation of the Levenberg-Marquardt method or "hookstep" 
(Dennis and Schnabel, 1996) suitable for GMRES as first suggested by Viswanath (2007). 
See also Viswanath (2009); Viswanath and Cvitanovic (2009); Halcrow et al. (2009). First, 
we recall the basic structure of model trust region methods; see (Dennis and Schnabel, 
1996, Chapter 6, Section 4). We then adopt this structure to the case of Krylov subspace 
methods, particularly GMRES. Again, see ? for a more detailed discussion of this method. 

4.3.1. Model Trust Region Methods. Trust region methods assume that for steps s satisfying 
||s||2 < 5, the function 

||F(x)||i + ((OT)(x)^F(x))Ts + sT(Z)F)(x)^(Z)F)(x)s 

is an accurate local model oi f {'x.) = ||F(x)|| 2/2 for suitably small steps. Note that rfix is not 
the usual, quadratic model of / derived from a Taylor series because (-DF)(x)^(DF)(x) 7^ 
(DV/)(x) (Dennis and Schnabel, 1996, pg. 149). The idea is to solve 

(30) miri mx(s). 

||s||2<<5 

The solution is given as follows: take = = — (Z)F)(x)~-'^F(x) if ||s^||2 < S; if 
||s^||2 > 5, take s* = s(/i*) where 

s(^) = -((OT)(x)"^(i?F)(x) +/.l)-\OT)(x)"rF(x) 

and > is the unique /i > such that ||s(^)||2 = 5. These follow from the standard 
optimality conditions, or rather that the gradient (Vmx)(s) must lie in the negative normal 
cone to Ms{0) = {y G : ||y||2 < 5} at x (Clarke, 1975); see (Dennis and Schnabel, 1996, 
Lemma 6.4.1, pg. 131). 

Solving the problem above exactly generates the Levenberg-Marquardt method (Leven- 
berg, 1944; Marquardt, 1963) or "hookstep." By computing the SVD of (DF)(x) we can 
easily solve for s(^) when ||s^||2 > S (Dennis and Schnabel, 1996); see (Golub and Loan, 
1996, Section 12.1, pgs. 580-583) for closely related results. Let (£'F)(x) = USV^. We 
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can then set s(/i) = Vr(/i) where 

r(//) = + /iI)-i5]UTF(x). 

A simple single-dimensional iteration can then be used to solve for the unique /i^, such that 
||s(/i*)||2 = S. ? derives two globally convergent methods for this task using Newton's 
method and a nonlinear local model (Dennis and Schnabel, 1996). The difficulty here is 
computing the SVD of (Z)F)(x), requiring 0{J^) flops (Golub and Loan, 1996, Chapter 
5, pg. 254). 

The step s* computed by either approach is acceptable if it generates sufficient decrease 
in the squared 2-norm of F. Specifically, fix p G (0, 1), a > 1, and /32 < Pi < 1- If 

||F(x)||i - ||F(x + s,)||2 > p(||pF)(x)||2 - ||F(x) + (DF)(x)s.||2) 

then p p + s* and a the step length bound is expanded to [5, a5] for the next iteration. 
Otherwise, 5 is chosen from /32(5] and the corresponding is computed. While this 
process of specifying an acceptable is iterative, much of the work required to build a trial 
step does not need to be repeated. Specifically the SVD required for the hookstep does 
not change (so long as it was computed in a previous iteration) while in the doglep step 
the Newton and Cauchy steps remain the same. However every time the step size bound 
is decreased F must be re-evaluated at the new trial step, with a computational burden 
equivalent to taking a fixed-point step. 

4.3.2. Model Trust Region Methods on a Subspace. A Krylov method for solving {DF){x.)s^ = 
— F(x) builds approximate solutions in the successive Krylov subspaces )C^"'\ This has the 
effect of further constraining the local model problem (30) to 

(31) min m^{s). 

sG/C("), ||s||2<<5 

For any Q S M'^^" with orthonormal columns (generated by GMRES or not) we can set 
"ix,Q(y) = "ix(Qy) and restrict attention to the trust region problem min||y||2<5 rhx,Q(y). 
See (Brown and Saad, 1990, pgs. 149-150). The first-order conditions for this problem are 
equivalent to either 

(i) (Vmx,Q)(y) = and ||y||2 < d 

(ii) or (Vmx,Q)(y) + fiy = for ||y||2 = <J and some /x > 0. 
By the definition of m^^Q, (i) implies 

Q'r(DF)(x)^(OT)(x)Qy + Q'r(I)F)(x)^F(x) = 

and (ii) implies 

(Q^(Z)F)(x)^(i?F)(x)Q + ^l) y + Q^(Z)F)(x)^F(x) = 0. 
Note that these are square problems that can be solved exactly. 
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4.3.3. The GMRES-Newton Hookstep. Using GMRES staHed at zero, (L>F)(x)Q(") = Q("+i)h(") 
and (Q^"'^^^)^F(x) = — sign(Fi(x))||F(x)||2ei. Thus we consider the family of nxn hnear 
systems 

(Q("))^(I?F)(x)^(DF)(x)Q(")q + /xq+(Q("))^(DF)(x)TF(x) 

= ((H("))^H(")+/.l)q-sign(Fi(x))||F(x)||2(H("))^ei = 

defined for all /-t > 0. 

By computing the ("thin") Singular Value Decomposition of H^"), H^") = USV^ where 
U G ]R{'"+i)x", V G M"^", and E G M"^", we can easily solve each such problem. See 
(Golub and Loan, 1996, Section 12.1, pgs. 580-583) for closely related results. Particularly, 

((hW)ThW +;,I)q_sign(Fi(x))||F(x)||2(H("))^ei = 

is solved by q(/u) = Vr(;u) where 

r{fi) = sign(Fi(x))||F(x)||2(s2 + ^I)-^5]U^ei. 

Because the diagonal elements of are positive, r(/i) is well defined for all > 0. Note 
also that we only need the first row of U, but all of V, to compute q(/u). 

In particular, q(0) = sign(Fi(x))||F(x)||2VS-iU"^ei. Invoking the full SVD of H^'^), 



for some u„+i ± span{uj}"^-^, we can write 
||H('^)q-sign(Fi(x))||F(x)||2ei||2 = 



0^ 



V 



SV"^q 




sign(Fi(x))||F(x) 



We thus see that q(0) solves the (n + 1) x n GMRES least squares problem 

min||H("+i'")q-sign(Fi(x))||F(x)||2ei||2. 
q 

with residual ||F(x)||2. |ni^„+i| is unique: First, note that u„+i is a unit vector in 

the span of a single vector, say v, that is orthogonal to the span of the columns of U. There 
are only two unit vectors in this span, specifically ibv/||v||2, and thus u„+i G {ibv/||v||2}. 

Thus |til,n+l| e |±Ui/||v||2| = l-Ull /||v||2. 

It is also easy to see that 

F(x)^(Z)F)(x)s(")(^) = F(x)^(DF)(x)Q(")q(")(^) 

(Q("+i))^F(x))^H(")q(")(/i) 



-f3' [ul D(^)t/i^ 
-||F(x)||i(i/7D(/.)i.i) <0 
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where Vi is the first row of U and D(/_i) = diag((ii(/i), . . . , dn{fi)) for di{fi) = erf /{af + fi). 
That is, the Householder GMRES-Newton Hookstep always lies in a descent direction for the 
globalizing objective /(x) = ||F(x)||2/2. 

4.3.4. Directional Finite Differences. Recall that one advantage to using an iterative solver 
like GMRES to solve the Newton system is that only products of the type (L'F)(p)s will be 
required to solve the Newton system for F at p (Brown and Saad, 1990; Pernice and 
Walker, 1998). Such products can be approximated by a single additional evaluation of F 
in a "directional" finite difference (Brown and Saad, 1990; Pernice and Walker, 1998). For 
example, the first-order formula 

(i:>F)(x)s « (F(x + /is) - F(x)) , 

requires only a single additional evaluation of F per (approximate) evaluation of (L'F)(x)s. 
Higher-order formulae requiring 2 and 4 additional evaluations of F are easy to derive; see 
Pernice and Walker (1998). In their implementation of the GMRES method in the context 
of an inexact Newton method, Pernice and Walker (1998) only use higher order finite- 
differencing formulas at restarts. Brown and Saad (1990) provide a practical formula for 
computing an appropriate value of h. 

Since directional finite derivatives must be repeated at each step of iterative linear solvers, 
each step of an iterative Newton system solver using directional finite differences could be 
at least as expensive as a ^-FPI step. That is, if an iterative solver should take 100 steps 
to compute an inexact Newton step having small enough residual to satisfy the inexact 
Newton condition, then we could have equivalently taken 100, 200, and 400 ^-FPI steps 
with the first, second, and fourth order formulae available in Pernice and Walker (1998). 
In our examples, using GMRES regularly solves the Ty-NM and ^-NM Newton systems in 
approximately 10 steps. This implies that each T7-NM and ^-NM step is roughly equivalent 
to 10 C-FPI steps. 

In the Newton context, whether efficiency is ultimately gained by using directional finite 
differences instead of computing the Jacobian matrices and using standard matrix-vector 
products depends on the number of steps taken by the iterative linear solver. If GMRES 
takes N € N iterations to find an inexact Newton step for F, computing and using the 
Jacobian requires 0{{S + N)j'^) flops while using directional finite differences requires 
0{SN^^^^jj) flops. 

We have observed that for ry-NM and ^-NM, using the actual Jacobian takes roughly 
half the computation time than using directional finite differences, even though GMRES 
converges in very few iterations {N ~ 10). Fig. 2 plots the sample trials for the Boyd 
and Mellman (1980) model provided in Morrow and Skerlos (2010) using both analytical 
Jacobians and directional finite differences. First note that the ^-FPI regularly takes 
about 1 s per iteration. For k = 1 USD, the single-step convergence of the GMRES-Newton 
Hookstep method translates into about 10 ^-FPI steps, or about 10 s. Because GMRES 
itself requires some small overhead {0{Jn) in the n}"^ step), this is a somewhat reasonable 
estimate of the work required. Two GMRES-Newton steps are required with k = 10 USD 
and we would expect about 20 s, a somewhat less sound estimate of the time required. 
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Three GMRES-Newton steps are required with k = 100 USD, leading us to expect about 20 
s, a further less sound estimate of the time required. (These observations can be matched 
with an asymptotic analysis of the work required.) Note also that the ry-NM has the 
greatest increase in time as a consequence of using the directional finite differences. This is 
a consequence of having to repeat block QR factorizations when evaluating rj at different 
points, while evaluating (Dri) requires only a single factorization. 

Fig. 3 plots the sample trials for the Berry et al. (1995) model provided in Morrow and 
Skerlos (2010) using both analytical Jacobians and directional finite differences. Interest- 
ingly, in this case use of the directional finite differences appears to generate a convergence 
rate improvement. Otherwise, the story remains much the same as that discussed above 
for the Boyd and Mellman (1980) model. 

5. Other Methods 

5.1. Variational Methods. Equilibrium problems are commonly formulated as varia- 
tional inequalities or complementarity problems (Barker and Pang, 1990; Ferris and Pang, 
1997). To be nontrivially distinct from nonlinear equations, such formulations require 
restricting the variables to a proper, convex subset of M"'. When < oo there is an 
appropriate variational formulation of the equilibrium pricing problem: 

(32) find pe[0,^^Y such that (V7r)(p)'^(p - q) > for aU qG[0,?*]-^. 

5.1.1. The VI formulation is poorly posed. Unfortunately, the Variational Inequality (32) 
is poorly posed when the derivatives of profit vanish as prices approach < oo. There are 
two specific issues with Eqn. (32) in this case. First, <;*1 E V'^ is always a solution but never 
an equilibrium when profits vanish as all prices approach c^^,; see Section ?? and Lemma 5.1. 
Second, Eqn. (32) can be solved by any equilibrium of any differentiated product market 
model constructed with a subset of the products offered (Prop. 5.2). Equilibria of such 
"sub-problems" are not necessarily equilibria of the original problem, as demonstrated in 
Example 8 below. This issue with Eqn. (32) is, in fact, equivalent to the problem with 
CG-NM discussed in Section 3.2. 

These issues imply that variational methods can compute many "spurious" solutions. If 
an equilibrium problem and all its sub-problems have unique equilibria with all prices less 
than <;^,, Eqn. (32) has 2"^ solutions that might be recovered by a global method such as 
PATH (Ralph, 1994; Dirkse and Ferris, 1995). Bowever, only one of these solutions is an 
equilibrium of the original problem, by assumption. A simple example demonstrates this 
phenomenon. 

Example 8. Consider a monopoly with two products produced at the same unit cost c. 
Demand is given by a simple Logit model with product- specific utility functions Uj{pj) = 
a log(? — pj) + Vj for j £ {1, 2}, where ? € (c, oo), vi,V2 £ M, and § > — oo. The firm has 
unique profit-maximizing prices (^1,^2)- Furthermore Pi,P2 < and (^1,^2) unique 
fixed-point of the map c + (^(•) on all ofV^. 

However the variational inequality formulation contains four distinct solutions, only one 
of which is profit-maximizing. These four solutions are (^1,^2); (^i??)) and (^,(72); 
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Figure 2. Typical convergence curves for perturbation trials under the 
Boyd and Mellman (1980) model using both analytical and directional fi- 
nite difference Jacobians. See also Fig. ??. Convergence curves for ana- 
lytical Jacobian are drawn with solid lines, whereas convergence curves for 
directional finite differences are drawn with dashed lines of the same color. 
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Figure 3. Typical convergence curves for perturbation trials under the 
Boyd and Mellman (1980) model using both analytical and directional fi- 
nite difi'erence Jacobians. See also Fig. ??. Convergence curves for ana- 
lytical Jacobian are drawn with solid lines, whereas convergence curves for 
directional finite differences are drawn with dashed lines of the same color. 
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where Qj < for j G {1,2} are the unique profit-maximizing prices that exist should the 
firm offer only product 1 or 2. Only the first solution, (Pi,P2)' '^•^ profit-maximizing. 

Proof. We complete the details of Example 8. 

Consider a monopoly with two products produced at the same unit cost (c = ci = C2 > 
0), "!? > — oo, and simple Logit model with utility 

ui{pi) = a\og{(; - pi) + vi and U2(p2) = a log(? - P2) + t'2 

for some fixed ? € (c, 00), q > 1, and arbitrary vi,V2 € M. Let p2 < and observe that 

'P2 - c 



lim [pi -C-Cl(pi,P2)) = ? - C- P2(?,?'2)(P2 - c) = (?-c) 
Pit? V / 



1-^2('^,P2) 



Since P2 < ^ and P2{pi,P2) < 1 for all pi,P2, we have limp^-t-^(pi — c — Ci(pi,P2)) > 0. Thus 
{Ditt){pi,P2) < for all pi sufficiently close to A similar argument can be made for 

{D27r){pi,P2). 

Note also that this proves that <; + e > c + + £,^2) for any e > and p2, where Ci is 
the extended map. A similar result holds for ^2, instead of Ci- Thus no {pi,P2) outside of 
(0, <j) is fixed for the extended map c + C(p)- 

We now prove that there exists a unique pair of profit-maximizing prices p* = (^1,^2) ^ 
(0,?)2. Since 

lirn (pj - c - 0(Pi,P2) ) < 00 

for J G {1,2}, C = (Ci,C2) is bounded and continuous on 7^^. By Brower's fixed-point 
theorem, there exists a stationary point p* = (^^,^2)- Both prices must both be less than 
<;", since profits decrease for all prices sufficiently close to We now show that these prices 
are also unique, borrowing a technique from Morrow and Skerlos (2008). 

The first step is to prove that (-DV7r)(p*) is negative definite at any stationary p*. Note 
that (L'V7r)(p*) = A(p*)(I - (L'C)!?*)); this relationship is valid for Mixed Logit models 
with multiple firms as well. Furthermore Ci(p) = ^(p) ~ {Dwk){pk)~^ for any simple Logit 
model and any number of products. Hence 

/ {D'^Wk){pk 



{DkCj){p) = {DkiT){p) + 6j, 



\{Dwk){pky 



and I — {D(^){p*) is a diagonal matrix with elements 

^ {D^Wk){pk) 
{Dwk){pk?' 

In the case of this example, 

^ _ [D''wi)[pi) ^ ^ _ {D^W2){P2) ^ ^ ^ 1 > 0^ 
{Dwi){pi)'^ {DW2){P2Y « 

Thus (DV7r)(p*) is negative definite at any stationary point, and any stationary point 
maximizes profits. 
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The next step is to prove that the existence of only maximizers of profits proves that 
there is a unique pair of profit-maximizing prices. Morrow and Skerlos (2008) accomphsh 
this with an apphcation of the Poincare-Hopf theorem (Milnor, 1965), as follows. Consider 
— 7r(p). This function is minimized at any stationary p* = (^1,^2)5 and thus the gradient 
vector field — (V7r)(p) has index 1 at any stationary point p* (Milnor, 1965). Note also 
that 

sign{-{DjTT){pi,p2)} =sign|pj -c-Tr{pi,p2) - J'^ | 

for j G {1, 2}. This equation shows that the gradient vector field — (V-7r)(p) points outward 
on the boundary of the compact, convex set [c, ?]^, as can be checked. Thus the Poincare- 
Hopf theorem states that the sum of the indices of the critical (stationary) points equals 
one, the Euler characteristic of [c, <f]^. Since the index of any critical (stationary) point of 
— (V7r)(p) is one, there can only be one stationary point. 

Using similar arguments, we see that the sub-problems formed by offering product 1 or 
product 2 alone also have unique profit-maximizing prices g| and respectively. Because 
vi and V2 may be distinct, these prices need not be the same. 

We have claimed that variational formulation of this problem has four solutions, only 
one of which is an equilibrium. Indeed, these four solutions are (^^,^2)) (^ij?); (^j^Dj 
and (<?,<?) but, as shown above, only {pl,P2) is an equilibrium. While this follows from 
Props. 5.1 and 5.2 above, we prove it directly here. Of course, (p*,P2) a solution since 
(V7r)(pl,p*) = (0,0). Since 



lim Xj{pi,P2) = lim 



a 







for j G {1,2}, limp^|^(Dj7r)(pi,p2) = (i.e.. Assumption 3.1 holds). Thus (V7r)(?,?) = 
(0,0), and the variational inequality is satisfied at (?,?). Furthermore, 

(^i7r)(?,P2)(? - qi) + (^27r)(?,P2)(P2 - ^2) = (^27r)(?,P2)(P2 - q2) 

and thus (?, ^2) is also a solution to the variational inequality. Similarly, (ql,^) is also a 
solution. This completes the proof. □ 

Example 8 is easily generalized to include J > 2 products and a variational inequality 
with 2"^ solutions. One of these solutions is the unique vector of profit-maximizing prices 
for the original problem, one is ?1 G V'^ and is not profit-maximizing for any sub-problem, 
and the rest are profit-maximizing for some sub-problem but not profit-maximizing for the 
original problem. 

This property of variational formulations is especially problematic since computations of 
equilibrium prices must often be performed using models with c^^, < 00. Such models may 
be derived from simulation-based approximations to Mixed Logit models with reservation 
prices that are finite /i-a.e., as in the Berry et al. (1995) model of Example 2. 

Fortunately methods based on the C map resolve only equilibria of the original problem. 
In Section 5.1.3 we consider the important class of simulation-based approximations to 
Mixed Logit models like those from Example 2 and prove that fixed-points of c -|- ^(•) 
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cannot be equilibria of a sub-problem that is not an equilibria of the original model. This 
is essentially a consequence of Eqn. (21), which connects the sign of {DkTff){p) directly to 
the sign of Pk - Ck - Cfc(p)- 

Similar results may apply to the markup equation. However because Eqn. (20) involves 
(Z)P)(p)^ instead of simply the diagonal matrix A(p), the relationship between the sign 
oi Pk — Ck — Tlkip) and the sign of (L'fc7rj)(p) is not clear. 

5.1.2. General Results. We now prove the results stated above concerning a variational 
formulation of the price equilibrium problem when < oo. 

Proposition 5.1. Suppose < oo and Assumptions 2.1-3.1 hold. Then the variational 
inequality (32) always contains <;*1 E V'^ as a solution. 

Proof. Since (V7r)(<j*l) = 0, Eqn. (32) is trivially satisfied. □ 

The following proposition states that this variational formulation is poorly posed in the 
sense that it contains solutions to all sub-problems. 

Proposition 5.2. Let < oo and Assumptions 2.1-3.1 hold. Consider a proper subset 
J' C N(J) of J' = \ J'\ product indices, and any solution pj, = {pj : j € J'') to the 
sub-variational inequality 

Y,iDjnfy))ip*j,)ip*-qj)>0 for all qj' = {Qj : j e j'} C [0,<;.f . 

If we define p G [0, <;*]"^ by pj = p* for all j G J^' and pk = for all k ^ J' then p solves 
the full variational inequality (32). 

r (z?,7r;(,))(pt,,) if jey 

\ ifj^:^' 

3&J' 

□ 

5.1.3. The Resolution of Equilibria with C,. We have shown that variational formulations 
of the equilibrium problem nest equilibria of all sub-problems, which may not be equilibria 
of the original problem as Example 8. In this section we show that methods based on 
the C, map need not have this unfortunate shortcoming. This result strongly distinguishes 
nonlinear system methods based on the C, map from variational approaches. 

We motivate this result with an example. 

Example 9. Consider a finite- sample approximation to the Berry et al. (1995) model of 
Example 2. That is, choose G N and draw {0s}f=i where Og = {(l)s, f^si Po,s)- These 



Proof. Because 

(^i^/{i))(p) 

we have 

J 

i=i 

for all q G [0,?*]"'. 
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samples could be drawn via standard sampling from fj, or from another technique like im- 
portance or quasi-random sampling. In any case, suppose that the (p's drawn are distinct 
with probability one: (f>s ^ (fir for all s,r £ N(5) with probability one. Without loss of 
generality we take (pi < cj)2 < • • • < (ps, o.^'d note that = (ps < oo. // p = c + Cip) 
and Pi. > then firm f{k) 's profits increase with the price of the k*^ product in some 
neighborhood of . 

Thus if we compute some fixed-point p = c-|-^(p) with > we know that excluding 
product k is profit-optimal for firm f{k). As shown in Example 8, this is not the case with 
the VI formulation. 

Proof. We will first define C on all of V'^ , and then consider fixed-points p = c + C(p) with 

Pk> (t>s = 

To extend we define 

Ck{pi, - ■ ■ ,Pk, - ■ ■ ,pj) = Ck{pi,---,^*,---,Pj) = lim Ck{pi, - ■ ■ ,q, - ■ ■ ,pj)- 
when Pfc > Note that for all k and all p G (0, <;^,)'^ we can write 



We first define liiaip^-^^^ Cfc(p)i we first note that for all pk G ('/'s-i, (l^s) 7^ we have 



4>S-Pk 



a(p)= pH^s,p){Pj-Cj)+ ^ 

since pk > 4>s for all s £ {1, . . . , S — 1}. Thus 



lim Cfc(p) = V 

3&Js(k)\{k} 



lim p^{es,v) 



In other words, as pk approaches 05 = Cfc approaches the profits firm f{k) accrues from 
selling all products other than p^ to the sampled individual with the highest income. This 
establishes that the extended C, is well-defined and continuous. 
Now suppose Pk = Ck + Cfc(p)) where Pk > 4>s = 'i*- Thus 



= Pk-Ck- Cfc(p) > </'5 - Cfe - Cfc(p) = lim [Qk-Ck- CkiPi, ■ ■ ■ ,qk, ■ ■ ■ ,Pj] 
and there must exist some 5 > such that 

qk - Ck - Ck{pi, ■■■,qk,---,Pj) <0 
for all Qk £ - 5,?*). Hence [Dkir f(k)){Pi, ,Pj) > 

{DkT^f{k))(.Pi, ,Pj) = Afcbi, ...,qk,--- ,P.j){qk - Ck - Ck{pi, ■■■,qk,--- ,Pj)) > 0. 

In other words, if p = c-|-^(p) and pk > then firm /(A;)'s profits increase with the price 
of the k^^ product in some neighborhood of □ 
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Fortunately this example is fairly general. In the following proposition we prove that all 
finite-sample simulators generate C maps that do not have equilibria of sub-problems as 
fixed points unless they are, in fact, equilibria of the original problem. Three assumptions 
are added: utilities must be twice continuously differentiable in prices, <;{6) is finite fi-a.e. 
as in the Berry et al. (1995) model, and the sampled values ?(0s) must be distinct with 
probability one. 

Proposition 5.3. Consider a Mixed Logit model satisfying Assumptions 2.1, 2.3, and 3.1 
with Wj{0, ■) : (0, — s- M twice continuously differentiable in price and ? : T — )• P finite 
fi-a.e.. 

Generate a finite-sample simulator to this Mixed Logit model with {6s}f^i for some 
Sen. Let <;s = and assume that <;s ^ ?r with probability one for any s ^ r. 

Subsequently, order the samples so that c^i < ■ ■ ■ < (;s = q^,. 

Suppose that p E V"^ satisfies p = c -|- C(p) where C is the extended map as in Example 
9. If Pk ^ ^5; then excluding product k is profit- optimal for firm f = f{k); particularly, 
there exists 5 > such that 

{DkT^f{k)){Pi, • • • • • • ,pj) > 

for allpk E (?s - 

Proof. The case pk > is handled exactly as in Example 9. We must only consider the 
case where 

= ?5 - Cfe - lim Cfc(p) = lim Pk - Ck - Cfc(p) 

PfeKS PfcKs L 

Our approach is to show that Dk[pk — Ck — Cfe(p)] > for all pk near enough to qs, and 
thus 

Pk - Ck- Cfc(p) =Pk- Ck - Cfc(p) - <^S - Ck - lim Cfc(p) 



/ Dk\Pk - Ck - Ck{p)]dpk < 



(with a slight abuse of notation in the integral). More specifically, we prove that limpj.^^g Dk [pk- 
Ck — Cfc(p)] > 0, which implies that Dk\pk — Ck — Cfe(p)] > foi' all Pk near enough to qs- 
Because Pk — Ck — Cfe(p) < for pk near enough to ^5, 



(^fc7rj(fc))(pi,. . .,qk,... ,pj) = Xk{p)[Pk - Ck- Cfc(p)j > 0. 
As in Example 9, note that for all pk E (?5--i,?s) we have 

a(P)= E p,'(...p)(p,,-c,)-^j^-J^ 



since > <;s for all s E {1, . . . , S* — 1}. From this equation we derive 



{DkCk){p)= E P){P, - c,) + PtiOs, p) + 
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and thus 



Pk-Ck- Cfc(p) 



= i-{Dwk){es,Pk)Pi:{Os,p) Yl Pj'{Os,p){Pj-cj)-Pt{es,p) 

Now limpj.-|-^g P^{ds,p) = 0, we have assumed that 



iD^wk)ies,Pk) 
{Dwk){es,Pkf 



hm [{Dwk){es,Pk)Pt{Os,p)\ =0 



(Assumption 3.1), and 



hm 

PfetTS 



p^{es,p){Pj-c,] 



V hm [P/(6>5, p)] (Pi - Cj) < oo, 



we have 



So long as 



hm Dfc 

Pfet?S 



1 — hm 



Pk - Ck- Cfc(p) 

(i:?2t(;fc)(6>5,pfc 



{D'^Wk){9s,Pk) 
{Dwk){es,PkY 



hm 

PfctfS 



< 1 



we have hmp^^^^^ Dk Pk-Ck-C,k{p) 
This completes the proof. 



> 0. This must be true, as Claim 1 below demonstrates. 

□ 



Claim 1. Let w : (0, <;) — t- M he twice continuously differentiahle, with {Dw){p) < for all 
p € (0, and {Dw){p) \. — oo as p\ Then 

'{D'^w){py 



lim 



< 1. 



{Dw){p)'^ 

Proof. Proof We prove this by contradiction. Note that 



D 



1 



\iDw){p)\ 



{D^w){p) 
{Dw){p) 



2 • 



Now if 



lim 

pt?s 



' {D'^w){p) 
{Dw){pY 

there must exist some p G (0, such that 



> h 



{Dw){pf 



> for all p G [p, 
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But then 



1 



\iDw){q)\ 



dq = lim 



1 



\{Dw){q)\ 



1 1 



\iDw){p)\ \{Dw)ip)\ 



<0, 



a contradiction. □ 

5.2. Tatonnement. Some authors iterate best responses — i.e. tatonnement — to compute 
equihbria. See, for example, Choi et al. (1990); CBO (2003); Michalek et al. (2004); Austin 
and Dinan (2005); Bento et al. (2005); Hu and Ralph (2007). For this process Newton's 
method, or another algorithm of (unconstrained) optimization, will be required. Taton- 
nement should be an efficient way to compute "equilibrium" if all firm's profit-maximizing 
prices are independent of their competitor's decisions, but wasteful if some firm's optimal 
pricing depends heavily on their competitors' prices. Furthermore no convergence guar- 
antees exist for tatonnement while there are at least theoretical guarantees that Newton's 
method, properly constructed, will converge to simultaneously stationary prices. 

5.3. Least-Squares Minimization and the Gauss-Newton Method. In principle 
one could also use optimization methods to explicitly minimize /(p) = ||F(p)||2/2 for 
any of our choices of F. In fact, line search and trust-region strategies for global con- 
vergence implicitly minimize this function (Dennis and Schnabel, 1996). Computations 
of equilibrium prices benefit from leaving this implicit, as explicit minimization via New- 
ton's method requires third-order derivatives of F, increasing both differentiability re- 
quirements and computational burden. The Gauss-Newton method (Ortega and Rhein- 
boldt, 1970) is obtained by neglecting the infiuence of the third-order derivatives of F. 
This defines the Gauss-Newton step as a solution to the (symmetric) normal equation 
(DF)(p)~''(DF)(p)s = — (L)F)(p)'''F(p); note that the same problem arises should one wish 
to use the Conjugate Gradient method to solve the Newton system. So long as (-DF)(p) is 
nonsingular the standard Newton steps will be recovered from the Gauss-Newton method. 
However they are explicitly formulated as solutions to linear systems that are more poorly 
conditioned (Golub and Loan, 1996; Trefethen and Bau, 1997) and thus we should at least 
expect to accumulate more error in the process of solving for the same steps. The bur- 
den of computing these steps also increases because of the requirement to multiply by the 
transpose of the Jacobian of F. 
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